Viewed   68 times

I am currently researching the best methods to integrate i18n into projects.

There's several methods I have thought of doing this, first being a database scheme to store the strings and relevant locale, but the problem with this is that it would not be that easy to select the strings, because i would not like to perform quesries like so:

SELECT text FROM locales WHERE locale = 'en_GB' AND text_id = 245543


SELECT text FROM locales WHERE locale = 'en_GB' AND text_primary = 'hello'

The next method would be to store them within files such as locales/en_gb/login/strings.php and then try and access them via an class specifically developed like so:

$Language = Registry::Construct('Language',array('en_GB'));
echo $Language->login->strings->hello;

The issue with this is I would have to build a system that would update these files via an administration panel witch is very time consuming, not just building the system to manage the strings but actually managing the strings as the site grows

  • What other methods are there that will be beneficial for a large system
  • Is there any automated way to do 'Translation' as such
  • Should I stick with a database method and build a system for users to translate strings with rating / suggest better version ?
  • What systems have you tried in the past and should I look into them or totally avoid them.



In addition to gettext already mentioned, PHP 5.3 has native Internationalization support

If that's not an option, consider using Zend Framework's Zend_Translate, Zend_Locale and related components for that. Zend_Translate supports a number of adapters, including but not limited to simple arrays, gettext, XmlTm and others.

Tuesday, November 29, 2022

Although the URL itself only allows US-ASCII characters, you can use Unicode characters in the URI path if you encode them with UTF-8 and then convert them in US-ASCII characters by using the percent-encoding:

A system that internally provides identifiers in the form of a different character encoding, such as EBCDIC, will generally perform character translation of textual identifiers to UTF-8 [STD63] (or some other superset of the US-ASCII character encoding) at an internal interface, thereby providing more meaningful identifiers than those resulting from simply percent-encoding the original octets.

So you can do something like this (assuming UTF-8):

$title = '???? ???';
$path = '/product/'.rawurlencode($title);
echo $path;  // "/product/%D8%A3%D8%A8%D8%AC%D8%AF%20%D9%87%D9%88%D8%B2"

Although the URI path is actually encoded with the percent-encoding, most modern browsers will display the characters this sequence represents in Unicode when UTF-8 is used.

Sunday, October 2, 2022

Use the new Internationalization module if you're using PHP 5.3. It uses the ICU library that's compatible with C/C++ and Java.

gettext() is on its way to becoming legacy code.

Sunday, October 16, 2022

Your question is a little bit misguided, as there's aren't generally such things in the tzdb as "obsoleted" time zones. In general, once the tzdb has introduced an identifier, it will continue to support it indefinitely - either as a Zone or as a Link.

There are only a couple of rare exceptions:

  • Canada/East-Saskatchewan was removed from the tzdb in version 2017c because it was a misnomer and exceeded the 14-character limit set by the tzdb maintainers. Any usages of that zone should be updated to America/Regina.

  • US/Pacific-New was removed from the tzdb in version 2020b because it created a lot of confusion. It was never a real time zone, but only a link. Any usages of that zone should be updated to America/Los_Angeles.

The examples and list you gave indicates that you would like canonical Zone entries, rather than their aliased Link entries. While either are valid, and all tzdb implementations should resolve links correctly, indeed should generally be preferred. So I can understand your desire to have all links resolved.

That said, the current implementation of the data file builder in moment-timezone doesn't distinguish between TZDB zones and links while it's building its data. Instead, it creates its own zone entry for the first unique data it encounters, and then creates its own link entry for any subsequent that match. This has the disadvantage of putting tzdb canonical zones front and center (which affects APIs like in older browsers), but also has the advantage of being able to link zones that are identical over the period covered in the data file.

For example, if you look at the current moment-timezone-with-data-2012-2022.js file, you'll find that Europe/Paris is the only zone entry for most places that uses CET/CEST. The rest are links back to Europe/Paris in this file, even those that have separate canonical tzdb zones, like Europe/Amsterdam, Europe/Berlin, and others.

This tradeoff was a design decision by the original author of moment-timezone, and is generally not something that is going to change. It keeps the data file smaller than it would otherwise.

Also, if you are targeting modern browsers (and/or Node.js), I highly recommend using Luxon instead. This is a newer library from the Moment team, and has the advantage of supporting time zones without shipping a data file (because most modern environments have Intl API support).

It's also worth noting that the Intl API tends to be implemented via ICU, which gets some of its data from TZDB and some of it from CLDR. Those two projects have slightly different ideas of what is "canonical". Basically, the canonical zone in CLDR is the one that was introduced first and it never changes - even if TZDB decides to demote to a link and replace with a new name. For example, you may get Asia/Calcutta as the default time zone for India from some Intl implementations, because CLDR considers it canonical. This is another reason while it's important that all implementations support both forms of identifiers equivalently.

Wednesday, October 19, 2022

Our game Gemsweeper has been translated to 8 different languages. Some things I have learned during that process:

  • If the translator is given single sentences to translate, make sure that he knows about the context that each sentence is used in. Otherwise he might provide one possible translation, but not the one you meant. Tools such as Babelfish translate without understanding the context, which is why the result is usually so bad. Just try translating any non-trivial text from English to German and back and you'll see what I mean.

  • Sentences that should be translated must not be broken into different parts for the same reason. That's because you need to maintain the context (see previous point) and because some languages might have the variables at the beginning or end of the sentence. Use placeholders instead of breaking up the sentence. For example, instead of

"This is step" "of our 15-step tutorial"

Write something like:

"This is step %1 of our 15-step tutorial"

and replace the placeholder programmatically.

  • Don't expect the translator to be funny or creative. He usually isn't motivated enough to do it unless you name the particular text passages and pay him extra. For example, if you have and word jokes in your language assets, tell the translator in a side note not to try to translate them, but to leave them out or replace them with a more somber sentence instead. Otherwise the translator will probably translate the joke word by word, which usually results in complete nonsense. In our case we had one translator and one joke writer for the most critical translation (English).

  • Try to find a translator who's first language is the language he is going to translate your software to, not the other way round. Otherwise he is likely to write a text that might be correct, but sounds odd or old-fashioned to native speakers. Also, he should be living in the country you are targeting with your translation. For example a German-speaking guy from Switzerland would not be a good choice for a German translation.

  • If any possible, have one of your public beta test users who understands the particular translation verify translated assets and the completed software. We've had some very good and very bad translations, depending on the person who provided it. According to some of our users, the Swedish translation was total gibberish, but it was too late to do anything about it.

  • Be aware that, for every updated version with new features, you will have to have your languages assets translated. This can create some serious overhead.

  • Be aware that end users will expect tech support to speak their language if your software is translated. Once again, Babelfish will most probably not do.

Edit - Some more points

  • Make switching between localizations as easy as possible. In Gemsweeper, we have a hotkey to switch between different languages. It makes testing much easier.

  • If you are going to use exotic fonts, make sure these include special characters. The fonts we chose for Gemsweeper were fine for English text, but we had to add quite a few characters by hand which only exist in German, French, Portughese, Swedish,...

  • Don't code your own localization framework. You're probably much better off with an open source framework like Gettext. Gettext supports features like variables within sentences or pluralization and is rock-solid. Localized resources are compiled, so nobody can tamper with them. Plus, you can use tools like Poedit for translating your files / checking someone else's translation and making sure that all strings are properly translated and still up to date in case you change the underlying source code. I've tried both rolling my own and using Gettext instead and I have to say that Gettext plus PoEdit were way superior.

Edits - Even More Points

  • Understand that different cultures have different styles of number and date formats. Numbering schemes are not only different per culture, but also per purpose within that culture. In EN-US you might format a number '-1234'; '-1,234' or (1,234) depending on what the purpose of the number is. Understand other cultures do the same thing.

  • Know where you're getting your globalization information from. E.g. Windows has settings for CurrentCulture, UICulture, and InvariantCulture. Understand what each one means and how it interacts with your system (they're not as obvious as you might think).

  • If you're going to do east Asian translating, really do your homework. East-Asian languages have quite a few differences from languages here. In addition to having multiple alphabets that are used simultaneously, they can use different layout systems (top-down) or grid-based. Also numbers in east Asian languages can be very different. In the en-US you only change systems for limited conditions (e.g. 1 versus 1st), there are additional numeric considerations besides just comma and period.

Sunday, August 14, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :