mscharhag, Programming and Stuff;

A blog about programming and software development topics, mostly focused on Java technologies including Java EE, Spring and Grails.

Posts tagged with Localization

  • Saturday, 7 December, 2013

    Java: Moving conditions into Message files

    The Java classes ResourceBundle and MessageFormat provide a nice toolset for resolving localized messages inside Java applications. This post provides a small example on how you can move simple message related conditions from your Java code into message files using ChoiceFormat. If you already know about ChoiceFormat I do not think you will learn anything new in this post. However, in my experience many developers do not know about this nice little feature.

    Let's assume we have an application in which users can comment some kinds of content. Somewhere in the application we want to display a simple message that shows how often a certain piece of content has been commented. We want to show the following messages based on the number of comments:

    Number of comments Message
    0 This element contains no comments
    1 This element contains one comment
    2+ This element contains [numberOfComments] comments

    The standard approach

    To implement this feature using Java's ResourceBundle and MessageFormat we could use the following code:

    Message file (e.g. messages_en.properties):

    comments.no=This element contains no comments
    comments.one=This element contains one comment
    comments.multiple=This element contains {0} comments

    Java code:

    private String resolveMessage(String key, Object... args) {
      String pattern = resourceBundle.getString(key);
      return MessageFormat.format(pattern, args);
    }
    
    private String getMessage(int numberOfComments) {
      String message = null;
      if (numberOfComments == 0) {
        message = resolveMessage("comments.no");
      } else if (numberOfComments == 1) {
        message = resolveMessage("comments.one");
      } else {
        message = resolveMessage("comments.multiple", numberOfComments);
      }
      return message;
    }

    The method resolveMessage() is used to resolve a message key to an actual message using ResourceBundle and MessageFormat. To implement the requested feature we added three message keys to a properties file. Within getMessage() we implemented the logic to decide which message key should be used based on the passed numberOfComments variable.

    The getMessage() method produces the expected result:

    getMessage(0)   // "This element contains no comments"
    getMessage(1)   // "This element contains one comment"
    getMessage(2)   // "This element contains 2 comments"
    getMessage(10)  // "This element contains 10 comments"

    Using ChoiceFormat

    However, there is actually an easier way to do this using ChoiceFormat. We can move the complete logic implemented in getMessage() into the properties file. We only need to define a single key:

    comments.choice=This element contains {0,choice,0#no comments|1#one comment|1<{0} comments}

    Using this message we can completely remove the logic of getMessage():

    private String getMessageUsingChoice(int numberOfComments) {
      return resolveMessage("comments.choice", numberOfComments);
    }

    The result is exactly the same:

    getMessageUsingChoice(0)   // "This element contains no comments"
    getMessageUsingChoice(1)   // "This element contains one comment"
    getMessageUsingChoice(2)   // "This element contains 2 comments"
    getMessageUsingChoice(10)  // "This element contains 10 comments"

    Let's have a closer look at the defined message:

    This element contains {0,choice,0#no comments|1#one comment|1<{0} comments}
    • 0,choice - tells MessageFormat we want to apply a ChoiceFormat for the first parameter (0)
    • 0#no comments - means we want to use the message "no comments" if the first parameter is 0
    • 1#one comment - returns "one comment" if the first parameter is 1
    • 1<{0} comments - uses the sub pattern {0} comments if the first parameter is greater than 1

    In conclusion choices provide a nice way to move simple message related conditions from Java code into message files.

  • Saturday, 16 November, 2013

    Six things I learned for software localization

    This blog post is a personal compilation of six technology independent things I learned in the past months about software localization. A few weeks ago we finally went live with our application supporting 22 different languages. As a German development team working for a German customer we used German as our base language within the application. Our customer was responsible for translating the German application messages into the other 21 languages and providing other localized material (images, downloadable documents, etc.).


    1. Use a tool
    You need a way to share message files and translations between developers and translators. We first started using a simple shared folder within a web hosted collaboration tool. In regular intervals we uploaded the newest versions of the German base message files. Translators used this file as a reference for updating the message files for the other languages.

    The obvious problem with this approach is that it causes a lot of unnecessary work. Whenever a message key was removed, added or renamed the change has to manually merged into 22 property files. If a German message changed we had to manually inform the translators so they could adjust the message for the other languages. Clearly this is not a process not want.

    Luckily there are some nice tools available that can support you with the whole translation process. We actually moved to the open source tool Pootle which reduced the amount of manual work a lot. However, I am sure that many alternative tools are available. Also note that you don't necessarly need a third party tool for this. If you prefer to save localized messages within a database you could easily create a CRUD UI with simple search funtionallity yourself, which then could be used by translators to update messages.


    2. Teach Translators
    You should make sure that translators fully understand the syntax of messages. For a developer it might be obvious how placeholders, escaping and date formats work. From a translator's view (who might not have any experience with software development at all) things aren't always that obvious. If your application crashes with date format exceptions in certain languages because the date format DD.mm.YYYY got translated to jour/mois/an (day/month/year in French) you know you have to improve on this point.

    Make sure to tell them how placeholders work and which special characters need to be escaped. Give them examples of common date/time patterns including the output those produce. Use comments in message files to provide common formatting options or to explain the placeholders that can be used within messages.

     
    3. Give translators context
    Just translating messages from one language into another often isn't enough. Translators need to know the context in which the message is displayed in order to provide an appropriate translation. The first step here is to give them access to a test system where they can see the application with a recent version of their translations.

    In regular intervals we received emails from translators with questions like this: Within the application I see message X at position Y. What is the message key for X?
    Depending on the message X a simple search for X in the message files doesn't always help (think of placeholders, additional markup or too many messages that contain X). Our solution to this was to extend the way messages were rendered for the UI. After that, it was posible to display the message keys in the test environment by adding an additional url parameter to our application urls. Whenever this url parameter was appended we added a simple <span> tag with a title attribute around rendered messages. So instead of [message] we rendered <span title="[key]">[message]</span>. This made it possible to just hover the displayed message with the mouse to see a small tool tip which shows the message key. This approach isn't 100% bulletproof because in some situations the additional <span> will break the layout. However, 95% of the time it works fine and it reduced the questions we received from translators a lot.

    The opposite way of this also exists: I see message X with Key Y in the message file. Where is it displayed in the application?
    I think the best solution for this is to follow a logical naming convention for message keys. We used the following simple convention to structure message keys:

    [module].[section].[detail].[optional subdetail]

    Some examples:

    news.create.title=Title
    news.create.title.emptyError=Please add a title
    news.create.title.maxLengthExceededError=The title cannot be longer than X characters    

    These are some messages shown at the title input field on the creation form (section) in the news module. The organization levels are split by dots. An error description like maxLengthExceeded does not describe the organization so it is written in camel case instead of news.create.title.max.length.exceeded.
    However, this is only a suggestion that worked fine for us. Feel free to come up with you own convention.


    4. Keep in mind word widths can vary
    Depending on your base language you should be aware that the average character count per word can be much higher or lower in other languages. I haven't found any real statistics of average word lengths, but I can show you some numbers from our message files:

    Average characters per word:
     

    Language Characters Factor
    English 5.3 1
    Portuguese 5.5 1.04
    French 5.7 1.07
    German 6.4 1.21
    Russian 6.7 1.25


    These are the average numbers taken from message files with around 1500 messages per file. Please note that these numbers aren't that accurate. To get the words of a message I simply split the messages by spaces. Words and messages can contain additional markup, punctuation or placeholders. However, since markup and placeholders are mostly the same for all languages it still gives some useful information. In our application single words in German or Russian are about 20% longer than English ones.

    You should make sure that your application UI supports varying text sizes. This is especially important for buttons and navigation elements which typically expand if their labels get larger. Be also aware that common abbreviations in one language might get translated into one (or maybe more) complete words in other languages. For example FAQ or Q&A are two commonly used navigation elements on English web pages. While the message Questions and Answers can be translated into different languages there might not always be a common abbreviation for this.


    5. Test it
    Extensively test the localized application: Validate translations, use non western characters as user input and check the functionality for all languages. To underline the importance of testing I just want to give a few examples of locale specific problems we ran into:

     

    • Users of a particular language didn't receive a certain email. It turned out that the email contained a date formatted by a locale dependent pattern. The pattern contained an invalid character, the date formatter failed and the email wasn't sent to the user.
    • In certain situations placeholders weren't replaced by actual content in French. The problem was caused by messages that contained unescaped single quotes. In Java's MessageFormat placeholders aren't replaced if they are located between two unescaped single quotes. We only noticed this problem in French because French messages contain much more single quotes than the messages from other languages we support.
    • UI elements broke because translated messages where too long and didn't fit into the reserved space.
    • It turned out that an external payment provider we are using doesn't support the full UTF-8 character set. So cyrillic characters couldn't be printed on invoices.


    6. It takes time
    The whole process of localization can take a lot of time. Especially if many people from different countries are involved. So make sure to plan it properly. Remember that every text message you add to the application needs to be translated.

    Once we added a small feature which took around a day of development effort. After the development was done it took around three weeks until we could push the feature to the live system. Some translators were on vacation and for some countries legal questions had to be clarified. Additionally we had some dependencies between translators. As mentioned above we used German as base language, but not every translator understood German. So in some cases the German messages had to be translated into English first before they could be translated into other languages.

    From a developers point of view this doesn't have to be bad. It is actually a very good excuse if the customer or project manager asks you one day before production release if you could add feature X and Y until tomorrow. Sure you could add it, but there is no chance that it gets localized until tomorrow, so it's better to plan it properly and move it to the next release ;-)

     

  • Friday, 4 October, 2013

    Single quote escaping in Java resource bundles

    In this post I will describe a small pitfall when using Java's ResourceBundle and MessageFormat classes, especially in combination with Spring's ResourceBundleMessageSource.

    ResourceBundle is a widely used class for localizing resources like message strings while MessageFormat can be used to replace placeholders within string messages. The typical usage of ResourceBundle and MessageFormat look like this:

    messages_en.properties:

    myMessage=Hello {0}

    Java code:

    ResourceBundle bundle = ResourceBundle.getBundle("messages");
    String pattern = bundle.getString("myMessage");         // Hello {0}
    String message = MessageFormat.format(pattern, "John"); // Hello John

    Code similar to this is used for example in the fmt JSP Tag library or in Spring's ResourceBundleMessageSource for retrieving localized messages.

    Whenever you are using MessageFormat you should be aware that the single quote character (') fulfils a special purpose inside message patterns. The single quote is used to represent a section within the message pattern that will not be formatted. A single quote itself must be escaped by using two single quotes ('').

    Let's look at some examples:

    messages_en.properties:

    test.message1=test {0} {1} {2}
    test.message2=test {0} '{1}' {2}
    test.message3=test {0} ''{1}'' {2}
    test.message4=test {0} '''{1}''' {2}
    test.message5=test {0} '{1} {2}
    test.message6=test {0} ''{1} {2}

    Java code:

    for (int i = 1; i <= 6; i++) {
      String pattern = bundle.getString("test.message" + i);
      String message = MessageFormat.format(pattern, 'A', 'B', 'C');
      System.out.println(message);
    }

    Output:

    test A B C
    test A {1} C
    test A 'B' C
    test A '{1}' C
    test A {1} {2}
    test A 'B C 

    As we can see placeholders between two simple single quotes are not replaced and all single quotes that are not escaped by another single quote don't show up in the output.

    It is important that everyone working on localization is aware of this. Note that the usage of single quotes varies a lot between languages. You can write pages of German text that do not contain a sole single quote character. In contrast, French text is typically full of single quotes. So it is a good idea to double-check messages of single quote heavy languages like French for proper escaping.

    Single quote escaping with Spring's ResourceBundleMessageSource

    Single quote escaping can become a bit more tricky if you are building a Spring application that makes use of Spring's ResourceBundleMessageSource class. If you have properly set up a ResourceBundleMessageSource bean it can be used to retrieve string messages like this:

    messages_en.properties:

    myMessage=Hello {0}

    Java code:

    Object[] args = new Object[] { "John" };
    Locale locale = new Locale("en");
    String message = messageSource.getMessage("myMessage", args, locale); // Hello John

    Internally ResourceBundleMessageSource uses Resourcebundle and MessageFormat for receiving string messages. However, ResourceBundleMessageSource does a small optimization by default. It will only parse a message through MessageFormat if one or more arguments have been passed in for the message. If no message arguments are passed to getMessage() the message text will be returned as-is. This means that the rules for escaping single quotes only need to be applied if the message takes any arguments.

    Let's look at another example:

    messages_en.properties:

    test.message1=John's message
    test.message2={0}'s message
    test.message3=John's {0}

    Java code:

    private void printMessage(String code, Object... args) {
      Locale locale = new Locale("en");
      String message = messageSource.getMessage(code, args, locale);
      System.out.println(message);
    }
    printMessage("test.message1");
    printMessage("test.message2", "John");
    printMessage("test.message3", "message");

    Output:

    John's message
    Johns message
    Johns {0}

    The first message does not take any arguments, so no MessageFormat is applied and the single quote does not need to be escaped. The second and third messages, however, are formatted by a MesssageFormat which processes the single quote characters. In these messages the single quotes should better be escaped with another single quote character otherwise they won't show up in the output.

    From a developer point of view the rules for escaping single quotes might make sense. However, you will have a hard time explaining these rules to localizers who have only a very basic technical knowledge. It just won't work!

    Luckily ResourceBundleMessageSource has a very useful flag called alwaysUseMessageFormat that can be enabled if all messages should be parsed by MessageFormat. If this flag is set to true (default value is false) all single quotes need to be escaped.

    If you are working with multiple localizers you should enable this option to simplify the rule of escaping single quotes. Otherwise, you (or the localizer) will go down in localization tickets if you have to support French ;-)