This blog post is a personal compilation of six technology independent things I learned in the past months about software localization. A few weeks ago we finally went live with our application supporting 22 different languages. As a German development team working for a German customer we used German as our base language within the application. Our customer was responsible for translating the German application messages into the other 21 languages and providing other localized material (images, downloadable documents, etc.).
1. Use a tool
You need a way to share message files and translations between developers and translators. We first started using a simple shared folder within a web hosted collaboration tool. In regular intervals we uploaded the newest versions of the German base message files. Translators used this file as a reference for updating the message files for the other languages.
The obvious problem with this approach is that it causes a lot of unnecessary work. Whenever a message key was removed, added or renamed the change has to manually merged into 22 property files. If a German message changed we had to manually inform the translators so they could adjust the message for the other languages. Clearly this is not a process not want.
Luckily there are some nice tools available that can support you with the whole translation process. We actually moved to the open source tool Pootle which reduced the amount of manual work a lot. However, I am sure that many alternative tools are available. Also note that you don't necessarly need a third party tool for this. If you prefer to save localized messages within a database you could easily create a CRUD UI with simple search funtionallity yourself, which then could be used by translators to update messages.
2. Teach Translators
You should make sure that translators fully understand the syntax of messages. For a developer it might be obvious how placeholders, escaping and date formats work. From a translator's view (who might not have any experience with software development at all) things aren't always that obvious. If your application crashes with date format exceptions in certain languages because the date format DD.mm.YYYY got translated to jour/mois/an (day/month/year in French) you know you have to improve on this point.
Make sure to tell them how placeholders work and which special characters need to be escaped. Give them examples of common date/time patterns including the output those produce. Use comments in message files to provide common formatting options or to explain the placeholders that can be used within messages.
3. Give translators context
Just translating messages from one language into another often isn't enough. Translators need to know the context in which the message is displayed in order to provide an appropriate translation. The first step here is to give them access to a test system where they can see the application with a recent version of their translations.
In regular intervals we received emails from translators with questions like this: Within the application I see message X at position Y. What is the message key for X?
Depending on the message X a simple search for X in the message files doesn't always help (think of placeholders, additional markup or too many messages that contain X). Our solution to this was to extend the way messages were rendered for the UI. After that, it was posible to display the message keys in the test environment by adding an additional url parameter to our application urls. Whenever this url parameter was appended we added a simple <span> tag with a title attribute around rendered messages. So instead of [message] we rendered <span title="[key]">[message]</span>. This made it possible to just hover the displayed message with the mouse to see a small tool tip which shows the message key. This approach isn't 100% bulletproof because in some situations the additional <span> will break the layout. However, 95% of the time it works fine and it reduced the questions we received from translators a lot.
The opposite way of this also exists: I see message X with Key Y in the message file. Where is it displayed in the application?
I think the best solution for this is to follow a logical naming convention for message keys. We used the following simple convention to structure message keys:
news.create.title=Title news.create.title.emptyError=Please add a title news.create.title.maxLengthExceededError=The title cannot be longer than X characters
These are some messages shown at the title input field on the creation form (section) in the news module. The organization levels are split by dots. An error description like maxLengthExceeded does not describe the organization so it is written in camel case instead of news.create.title.max.length.exceeded.
However, this is only a suggestion that worked fine for us. Feel free to come up with you own convention.
4. Keep in mind word widths can vary
Depending on your base language you should be aware that the average character count per word can be much higher or lower in other languages. I haven't found any real statistics of average word lengths, but I can show you some numbers from our message files:
Average characters per word:
These are the average numbers taken from message files with around 1500 messages per file. Please note that these numbers aren't that accurate. To get the words of a message I simply split the messages by spaces. Words and messages can contain additional markup, punctuation or placeholders. However, since markup and placeholders are mostly the same for all languages it still gives some useful information. In our application single words in German or Russian are about 20% longer than English ones.
You should make sure that your application UI supports varying text sizes. This is especially important for buttons and navigation elements which typically expand if their labels get larger. Be also aware that common abbreviations in one language might get translated into one (or maybe more) complete words in other languages. For example FAQ or Q&A are two commonly used navigation elements on English web pages. While the message Questions and Answers can be translated into different languages there might not always be a common abbreviation for this.
5. Test it
Extensively test the localized application: Validate translations, use non western characters as user input and check the functionality for all languages. To underline the importance of testing I just want to give a few examples of locale specific problems we ran into:
- Users of a particular language didn't receive a certain email. It turned out that the email contained a date formatted by a locale dependent pattern. The pattern contained an invalid character, the date formatter failed and the email wasn't sent to the user.
- In certain situations placeholders weren't replaced by actual content in French. The problem was caused by messages that contained unescaped single quotes. In Java's MessageFormat placeholders aren't replaced if they are located between two unescaped single quotes. We only noticed this problem in French because French messages contain much more single quotes than the messages from other languages we support.
- UI elements broke because translated messages where too long and didn't fit into the reserved space.
- It turned out that an external payment provider we are using doesn't support the full UTF-8 character set. So cyrillic characters couldn't be printed on invoices.
6. It takes time
The whole process of localization can take a lot of time. Especially if many people from different countries are involved. So make sure to plan it properly. Remember that every text message you add to the application needs to be translated.
Once we added a small feature which took around a day of development effort. After the development was done it took around three weeks until we could push the feature to the live system. Some translators were on vacation and for some countries legal questions had to be clarified. Additionally we had some dependencies between translators. As mentioned above we used German as base language, but not every translator understood German. So in some cases the German messages had to be translated into English first before they could be translated into other languages.
From a developers point of view this doesn't have to be bad. It is actually a very good excuse if the customer or project manager asks you one day before production release if you could add feature X and Y until tomorrow. Sure you could add it, but there is no chance that it gets localized until tomorrow, so it's better to plan it properly and move it to the next release ;-)