Saturday, 7 December, 2013

Java: Moving conditions into Message files

The Java classes ResourceBundle and MessageFormat provide a nice toolset for resolving localized messages inside Java applications. This post provides a small example on how you can move simple message related conditions from your Java code into message files using ChoiceFormat. If you already know about ChoiceFormat I do not think you will learn anything new in this post. However, in my experience many developers do not know about this nice little feature.

Let's assume we have an application in which users can comment some kinds of content. Somewhere in the application we want to display a simple message that shows how often a certain piece of content has been commented. We want to show the following messages based on the number of comments:

Number of comments	Message
0	This element contains no comments
1	This element contains one comment
2+	This element contains [numberOfComments] comments

The standard approach

To implement this feature using Java's ResourceBundle and MessageFormat we could use the following code:

Message file (e.g. messages_en.properties):

comments.no=This element contains no comments
comments.one=This element contains one comment
comments.multiple=This element contains {0} comments

Java code:

private String resolveMessage(String key, Object... args) {
  String pattern = resourceBundle.getString(key);
  return MessageFormat.format(pattern, args);
}

private String getMessage(int numberOfComments) {
  String message = null;
  if (numberOfComments == 0) {
    message = resolveMessage("comments.no");
  } else if (numberOfComments == 1) {
    message = resolveMessage("comments.one");
  } else {
    message = resolveMessage("comments.multiple", numberOfComments);
  }
  return message;
}

The method resolveMessage() is used to resolve a message key to an actual message using ResourceBundle and MessageFormat. To implement the requested feature we added three message keys to a properties file. Within getMessage() we implemented the logic to decide which message key should be used based on the passed numberOfComments variable.

The getMessage() method produces the expected result:

getMessage(0)   // "This element contains no comments"
getMessage(1)   // "This element contains one comment"
getMessage(2)   // "This element contains 2 comments"
getMessage(10)  // "This element contains 10 comments"

Using ChoiceFormat

However, there is actually an easier way to do this using ChoiceFormat. We can move the complete logic implemented in getMessage() into the properties file. We only need to define a single key:

comments.choice=This element contains {0,choice,0#no comments|1#one comment|1<{0} comments}

Using this message we can completely remove the logic of getMessage():

private String getMessageUsingChoice(int numberOfComments) {
  return resolveMessage("comments.choice", numberOfComments);
}

The result is exactly the same:

getMessageUsingChoice(0)   // "This element contains no comments"
getMessageUsingChoice(1)   // "This element contains one comment"
getMessageUsingChoice(2)   // "This element contains 2 comments"
getMessageUsingChoice(10)  // "This element contains 10 comments"

Let's have a closer look at the defined message:

This element contains {0,choice,0#no comments|1#one comment|1<{0} comments}

0,choice - tells MessageFormat we want to apply a ChoiceFormat for the first parameter (0)
0#no comments - means we want to use the message "no comments" if the first parameter is 0
1#one comment - returns "one comment" if the first parameter is 1
1<{0} comments - uses the sub pattern {0} comments if the first parameter is greater than 1

In conclusion choices provide a nice way to move simple message related conditions from Java code into message files.

Tags: Java, Localization

2 Comments

Saturday, 23 November, 2013

Getting started with Spring Data Solr
Spring Data Solr is an extension to the Spring Data project which aims to simplify the usage of Apache Solr in Spring applications. Please note that this is not an introduction into Spring (Data) or Solr. I assume you have at least some basic understanding of both technologies. Within the following post I will show how you can use Spring Data repositories to access Solr features in Spring applications.

Configuration

First we need a running Solr server. For simplicity reasons we will use the example configuration that comes with the current Solr release (4.5.1 at the time I am writing) and is described in the official Solr tutorial. So we only have to download Solr, extract it in a directory of our choice and then run java -jar start.jar from the <solr home>/example directory.

Now let's move to our demo application and add the Spring Data Solr dependency using maven:
```
<dependency>
  <groupId>org.springframework.data</groupId>
  <artifactId>spring-data-solr</artifactId>
  <version>1.0.0.RELEASE</version>
</dependency>
```
In this example I am using Spring Boot to set up a small example Spring application. I am using the following Spring Boot dependencies and the Spring Boot parent pom for this:
```
<parent>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-parent</artifactId>
  <version>0.5.0.BUILD-SNAPSHOT</version>
</parent>
```
```
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-test</artifactId>
  <scope>test</scope>
</dependency>
```
Don't worry if you haven't used Spring Boot yet. These dependencies mainly act as shortcut for common (Spring) dependencies and simplify the configuration a bit. If you want to integrate Spring Data Solr within an existing Spring application you can skip the Spring Boot dependencies.

The Spring bean configuration is quite simple, we only have to define two beans ourself:
```
@ComponentScan
@EnableSolrRepositories("com.mscharhag.solr.repository")
public class Application {

  @Bean
  public SolrServer solrServer() {
    return new HttpSolrServer("http://localhost:8983/solr");
  }

  @Bean
  public SolrTemplate solrTemplate(SolrServer server) throws Exception {
    return new SolrTemplate(server);
  }
}
```
The solrServer bean is used to connect to the running Solr instance. Since Spring Data Solr uses Solrj we create a Solrj HttpSolrServer instance. It would also be possible to use an embedded Solr server by using EmbeddedSolrServer. The SolrTemplate provides common functionality to work with Solr (similar to Spring's JdbcTemplate). A solrTemplate bean is required for creating Solr repositories. Please also note the @EnableSolrRepositories annotation. With this annotation we tell Spring Data Solr to look in the specified package for Solr repositories.

Creating a document

Before we can query Solr we have to add documents to the index. To define a document we create a POJO and add Solrj annotations to it. In this example we will use a simple Book class as document:
```
public class Book {

  @Field
  private String id;
  
  @Field
  private String name;
  
  @Field
  private String description;
  
  @Field("categories_txt")
  private List<Category> categories;
  
  // getters/setters
}
```
```
public enum Category {
  EDUCATION, HISTORY, HUMOR, TECHNOLOGY, ROMANCE, ADVENTURE
}
```
Each book has a unique id, a name, a description and belongs to one or more categories. Note that Solr requires a unique ID of type String for each document by default. Fields that should be added to the Solr index are annotated with the Solrj @Field annotation. By default Solrj tries to map document field names to Solr fields of the same name. The Solr example configuration already defines Solr fields named id, name and description so it should not be necessary to add these fields to the Solr configuration.

In case you want to change the Solr field definitions you can find the example configuration file at <solr home>/example/solr/collection1/conf/schema.xml. Within this file you should find the following field definitions:
```
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
<field name="name" type="text_general" indexed="true" stored="true" />
<field name="description" type="text_general" indexed="true" stored="true"/>
```
In general title would be a better attribute name for a Book than name. However, by using name we can use the default Solr field configuration. So I go for name instead of title for simplicity reasons.

For categories we have to define the field name manually using the @Field annotation: categories_txt. This matches the dynamic field named *_txt from the Solr example. This field definition can also be found in schema.xml:
```
<dynamicField name="*_txt" type="text_general"   indexed="true"  stored="true" multiValued="true"/>
```
Creating a repository

Spring Data uses repositories to simplify the usage of various data access technologies. A repository is basically an interface whose implementation is dynamically generated by Spring Data on application start. The generated implementation is based on naming conventions used in the repository interface. If this is new to you I recommend reading Working with Spring Data Repositories.

Spring Data Solr uses the same approach. We use naming conventions and annotations inside interfaces to define the methods we need to access Solr features. We start with a simple repository that contains only one method (we will add more later):
```
public interface BookRepository extends SolrCrudRepository<Book, String> {

  List<Book> findByName(String name);

}
```
We get some common methods like save(), findAll(), delete() or count() in the repository by extending SolrCrudRepository. With the definition of the interface method findByName(String name) we tell Spring Data Solr to create a method implementation that queries Solr for a list of books. The book names in this list should match the passed parameter.

The repository implementation can be injected into other classes using Spring's DI functionality. In this example we inject the repository into a simple JUnit test:
```
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = Application.class, loader=SpringApplicationContextLoader.class)
public class BookRepositoryTests {
  
  @Autowired
  private BookRepository bookRepository;
  
  ...
}
```
Adding a document to Solr

Now it is time to add some books to Solr. Using our repository this is a very easy job:
```
private void addBookToIndex(String name, String description, Category... categories) {
  Book book = new Book();
  book.setName(name);
  book.setDescription(description);
  book.setCategories(Arrays.asList(categories));
  book.setId(UUID.randomUUID().toString());
  bookRepository.save(book);
}

private void createSampleData() {
  addBookToIndex("Treasure Island", "Best seller by R.L.S.", Category.ADVENTURE);
  addBookToIndex("The Pirate Island", "Oh noes, the pirates are coming!", Category.ADVENTURE, Category.HUMOR);
  ...
}
```
Adding pagination and boosting

Assume we have an application where users are able to search for books. We need to find books whose name or description match the search query given by the user. For performance reasons we want to add some kind of pagination which shows only 10 search results at once to the user.

Let's create a new method in our repository interface for this:
```
Page<Book> findByNameOrDescription(@Boost(2) String name, String description, Pageable pageable);
```
The method name findByNameOrDescription tells Spring Data Solr to query for book objects whose name or description match the passed parameters. To support pagination we added the Pageable parameter and changed the return type from List<Book> to Page<Book>. By adding the @Boost annotation to the name parameter we are boosting books whose name matches the search parameter. This makes sense because those books are typically at higher Interest for the user.

If we now want to query for the first page containing 10 elements we just have to do:
```
Page<Book> booksPage = bookRepository.findByNameOrDescription(searchString, searchString, new PageRequest(0, 10));
```
Besides the first 10 search results Page<Book> provides some useful methods for building pagination functionality:
```
booksPage.getContent()       // get a list of (max) 10 books
booksPage.getTotalElements() // total number of elements (can be >10)
booksPage.getTotalPages()    // total number of pages
booksPage.getNumber()        // current page number
booksPage.isFirstPage()      // true if this is the first page
booksPage.hasNextPage()      // true if another page is available
booksPage.nextPageable()     // the pageable for requesting the next page
...
```
Faceting

Whenever a user searches for a book name we want to show him how many books matching the given query parameter are available in the different categories. This feature is called faceted search and directly supported by Spring Data Solr. We just have to add another method to our repository interface:
```
@Query("name:?0")
@Facet(fields = { "categories_txt" }, limit = 5)
FacetPage<Book> findByNameAndFacetOnCategories(String name, Pageable page);
```
This time the query will be derived from the @Query annotation (containing the Solr query) instead of the method name. With the @Facet annotation we tell Spring Data Solr to facet books by categories and return the first five facets.

It would also be possible to remove the @Query annotation and change the method name to findByName for the same effect. The small disadvantage in this approach is that it is not obvious to the caller that this repository method does perform facetting. Addionally the method signature might collide with other methods that search books by name.

Usage:
```
FacetPage<Book> booksFacetPage = bookRepository.findByNameAndFacetOnCategories(bookName, new PageRequest(0, 10));

booksFacetPage.getContent(); // the first 10 books

for (Page<? extends FacetEntry> page : booksFacetPage.getAllFacets()) {
  for (FacetEntry facetEntry : page.getContent()) {
    String categoryName = facetEntry.getValue();  // name of the category
    long count = facetEntry.getValueCount();      // number of books in this category
    
    // convert the category name back to an enum
    Category category = Category.valueOf(categoryName.toUpperCase());
  }
}
```
Note that booksFacetPage.getAllFacets() returns a Collection of FacetEntry pages. This is because the @Facet annotation allows you to facet multiple fields at once. Each FacetPage contains max. five FacetEntries (defined by the limit attribute of @Facet).

Highlighting

Often it is useful to highlight the search query occurrences in the list of search results (like it is done by google or bing). This can be achieved with the highlighting feature of (Spring Data) Solr.

Let's add another repository method:
```
@Highlight(prefix = "<highlight>", postfix = "</highlight>")
HighlightPage<Book> findByDescription(String description, Pageable pageable);
```
The @Highlight annotation tells Solr to highlight to occurrences of the searched description.

Usage:
```
HighlightPage<Book> booksHighlightPage = bookRepository.findByDescription(description, new PageRequest(0, 10));

booksHighlightPage.getContent(); // first 10 books

for (HighlightEntry<Book> he : booksHighlightPage.getHighlighted()) {
  // A HighlightEntry belongs to an Entity (Book) and may have multiple highlighted fields (description)
  for (Highlight highlight : he.getHighlights()) {
    // Each highlight might have multiple occurrences within the description
    for (String snipplet : highlight.getSnipplets()) {
      // snipplet contains the highlighted text
    }
  }
}
```
If we use this repository method to query for books whose description contains the string Treasure Island a snipplet might look like this:
```
<highlight>Treasure Island</highlight> is a tale of pirates and villains, maps, treasure and shipwreck, and is perhaps one of the best adventure story ever written.
```
In this case Treasure Island is located at the beginning of the description and is highlighted with the prefix and postfix defined in the @Highlight annotation. This additional markup can be used to mark query occurrences when the search results are shown to the user.

Conclusion

Spring Data Solr provides a very simple way to integrate Solr into Spring applications. With the repository abstraction it follows the same design principle most other Spring Data project do. The only small drawback I faced while playing around with Spring Data Solr was the documentation that could be improved here and there.

You can find the complete source code for this example on GitHub.
Tags: Java, Spring Data, Solr

10 Comments

Saturday, 16 November, 2013

Six things I learned for software localization

This blog post is a personal compilation of six technology independent things I learned in the past months about software localization. A few weeks ago we finally went live with our application supporting 22 different languages. As a German development team working for a German customer we used German as our base language within the application. Our customer was responsible for translating the German application messages into the other 21 languages and providing other localized material (images, downloadable documents, etc.).

1. Use a tool
You need a way to share message files and translations between developers and translators. We first started using a simple shared folder within a web hosted collaboration tool. In regular intervals we uploaded the newest versions of the German base message files. Translators used this file as a reference for updating the message files for the other languages.

The obvious problem with this approach is that it causes a lot of unnecessary work. Whenever a message key was removed, added or renamed the change has to manually merged into 22 property files. If a German message changed we had to manually inform the translators so they could adjust the message for the other languages. Clearly this is not a process not want.

Luckily there are some nice tools available that can support you with the whole translation process. We actually moved to the open source tool Pootle which reduced the amount of manual work a lot. However, I am sure that many alternative tools are available. Also note that you don't necessarly need a third party tool for this. If you prefer to save localized messages within a database you could easily create a CRUD UI with simple search funtionallity yourself, which then could be used by translators to update messages.

2. Teach Translators
You should make sure that translators fully understand the syntax of messages. For a developer it might be obvious how placeholders, escaping and date formats work. From a translator's view (who might not have any experience with software development at all) things aren't always that obvious. If your application crashes with date format exceptions in certain languages because the date format DD.mm.YYYY got translated to jour/mois/an (day/month/year in French) you know you have to improve on this point.

Make sure to tell them how placeholders work and which special characters need to be escaped. Give them examples of common date/time patterns including the output those produce. Use comments in message files to provide common formatting options or to explain the placeholders that can be used within messages.

3. Give translators context
Just translating messages from one language into another often isn't enough. Translators need to know the context in which the message is displayed in order to provide an appropriate translation. The first step here is to give them access to a test system where they can see the application with a recent version of their translations.

In regular intervals we received emails from translators with questions like this: Within the application I see message X at position Y. What is the message key for X?
Depending on the message X a simple search for X in the message files doesn't always help (think of placeholders, additional markup or too many messages that contain X). Our solution to this was to extend the way messages were rendered for the UI. After that, it was posible to display the message keys in the test environment by adding an additional url parameter to our application urls. Whenever this url parameter was appended we added a simple <span> tag with a title attribute around rendered messages. So instead of [message] we rendered <span title="[key]">[message]</span>. This made it possible to just hover the displayed message with the mouse to see a small tool tip which shows the message key. This approach isn't 100% bulletproof because in some situations the additional <span> will break the layout. However, 95% of the time it works fine and it reduced the questions we received from translators a lot.

The opposite way of this also exists: I see message X with Key Y in the message file. Where is it displayed in the application?
I think the best solution for this is to follow a logical naming convention for message keys. We used the following simple convention to structure message keys:

[module].[section].[detail].[optional subdetail]

Some examples:

news.create.title=Title
news.create.title.emptyError=Please add a title
news.create.title.maxLengthExceededError=The title cannot be longer than X characters

These are some messages shown at the title input field on the creation form (section) in the news module. The organization levels are split by dots. An error description like maxLengthExceeded does not describe the organization so it is written in camel case instead of news.create.title.max.length.exceeded.
However, this is only a suggestion that worked fine for us. Feel free to come up with you own convention.

4. Keep in mind word widths can vary
Depending on your base language you should be aware that the average character count per word can be much higher or lower in other languages. I haven't found any real statistics of average word lengths, but I can show you some numbers from our message files:

Average characters per word:

Language	Characters	Factor
English	5.3	1
Portuguese	5.5	1.04
French	5.7	1.07
German	6.4	1.21
Russian	6.7	1.25

These are the average numbers taken from message files with around 1500 messages per file. Please note that these numbers aren't that accurate. To get the words of a message I simply split the messages by spaces. Words and messages can contain additional markup, punctuation or placeholders. However, since markup and placeholders are mostly the same for all languages it still gives some useful information. In our application single words in German or Russian are about 20% longer than English ones.

You should make sure that your application UI supports varying text sizes. This is especially important for buttons and navigation elements which typically expand if their labels get larger. Be also aware that common abbreviations in one language might get translated into one (or maybe more) complete words in other languages. For example FAQ or Q&A are two commonly used navigation elements on English web pages. While the message Questions and Answers can be translated into different languages there might not always be a common abbreviation for this.

5. Test it
Extensively test the localized application: Validate translations, use non western characters as user input and check the functionality for all languages. To underline the importance of testing I just want to give a few examples of locale specific problems we ran into:

Users of a particular language didn't receive a certain email. It turned out that the email contained a date formatted by a locale dependent pattern. The pattern contained an invalid character, the date formatter failed and the email wasn't sent to the user.
In certain situations placeholders weren't replaced by actual content in French. The problem was caused by messages that contained unescaped single quotes. In Java's MessageFormat placeholders aren't replaced if they are located between two unescaped single quotes. We only noticed this problem in French because French messages contain much more single quotes than the messages from other languages we support.
UI elements broke because translated messages where too long and didn't fit into the reserved space.
It turned out that an external payment provider we are using doesn't support the full UTF-8 character set. So cyrillic characters couldn't be printed on invoices.

6. It takes time
The whole process of localization can take a lot of time. Especially if many people from different countries are involved. So make sure to plan it properly. Remember that every text message you add to the application needs to be translated.

Once we added a small feature which took around a day of development effort. After the development was done it took around three weeks until we could push the feature to the live system. Some translators were on vacation and for some countries legal questions had to be clarified. Additionally we had some dependencies between translators. As mentioned above we used German as base language, but not every translator understood German. So in some cases the German messages had to be translated into English first before they could be translated into other languages.

From a developers point of view this doesn't have to be bad. It is actually a very good excuse if the customer or project manager asks you one day before production release if you could add feature X and Y until tomorrow. Sure you could add it, but there is no chance that it gets localized until tomorrow, so it's better to plan it properly and move it to the next release ;-)

Tags: Localization

3 Comments

Sunday, 3 November, 2013

Groovy's magical NullObject
In this post I am going to explain some not that obvious differences of null in Java and null in Groovy.

Let's start with the following line:
```
Object o = null
```
This statement works fine in Java and Groovy (except that Java requires a semicolon at line end).
However, it has slightly different effects.

In Java null is a special literial, which is assigned to reference types that do not point to any object. Every time you try to do anything on a null reference (like calling methods or accessing member variables) a NullPointerException will be thrown.

In Groovy null is an object! It is an instance of org.codehaus.groovy.runtime.NullObject. Most times NullObject will throw a NullPointerException if you try to access a method or a member variable. However, there are a few methods that can be called on NullObject:
```
import org.codehaus.groovy.runtime.NullObject

assert NullObject == null.getClass()
assert       true == null.equals(null)
assert      false == null.asBoolean()
assert    "null!" == null + "!"
assert      false == null.iterator().hasNext()
```
As we can see the null object protects developers in some cases from NullPointerExceptions. asBoolean() returns always false and ensures that null can be converted to a boolean value when necessary. iterator() returns an instance of java.util.Collections$EmptyIterator. Because of that it is possible to safely iterate over objects without explicitly checking for null.

Interestingly I haven't found anything about NullObject in the official groovy documentation. It is not mentioned in Differences from Java nor in Groovy's Null Object Pattern.

There might be no practical use case for this but you can even create your own instance of NullObject:
```
Class c = null.getClass()
NullObject myNull = c.newInstance()
```
But be aware that the equals() method returns only true if you pass in the default NullObject instance. So it might not work correctly for your own NullObject instance:
```
assert false == myNull.equals(myNull)
assert  true == myNull.equals(null)
```
You can also modify the metaClass of NullObject to add you own methods:
```
NullObject.metaClass.myMethod = { println "I am null" }
null.myMethod()
```
Tags: Groovy

2 Comments
Tuesday, 22 October, 2013

Two things to remember when using Java RMI
This is a short blog post about two common pitfalls you should be aware of when using Java RMI.

Setting java.rmi.server.hostname

If you are getting strange Connection refused to host: <ip> error messages on the RMI client and you are sure the connection should work (you double checked all the standard things like network configuration etc.) the RMI system property java.rmi.server.hostname is something to look at.

To call a method on a remote object the RMI client has first to retrieve a remote stub object from the RMI registry. This stub object contains the server address that is later used to connect to the remote object when a remote method should be called (the connection to the RMI registry and the connection to the remote object are two completely different things). By default the server will try to detect his own address and pass it to the stub object. Unfortunatelly the algorithm that is used to detect the server address doesn't always produce a useful result (depending on the network configuration).

It is possible to override the server address that is passed to the stub object, by setting the system property java.rmi.server.hostname on the RMI server.

This can either be done in Java code

System.setProperty("java.rmi.server.hostname", "<<rmi server ip>>");

or by adding a Java command line parameter:

-Djava.rmi.server.hostname=<<rmi server ip>>

Setting RMI service ports

If you have trouble making RMI calls through a firewall you should make sure you set specific ports for remote objects. By default port 1099 used by the RMI registry so make sure you opened this port in the firewall. However, this port is only used by the client to connect to the RMI registry and not for the communication between the stub and the remote object. For the later one random ports are used by default. Since you don't want to open all ports in your firewall you should set a specific port for RMI remote objects.

This can be done by overriding the createServerSocket() method of RMISocketFactory:
```
public class MyRMISocketFactory extends RMISocketFactory {
  private static final int PREFERED_PORT = 1234;
  public ServerSocket createServerSocket(int port) throws IOException {
    if (port == 0) {
      return new ServerSocket(PREFERED_PORT);
    }
    return super.createServerSocket(port);
  }
}
```
By default createServerSocket() chooses a free random port if 0 is passed as parameter. In this modified version of createServerSocket() a specific port (1234) is returned when 0 is passed as parameter.

If you are using Spring's RmiServiceExporter you can use the setServicePort() method to export services on a specific port:
```
<bean class="org.springframework.remoting.rmi.RmiServiceExporter">
  <property name="servicePort" value="1234"/>
  ...
</bean>
```
Note that multiple remote objects/services can share the same port. After you set a specific port, you just have to open this port in your firewall.
Tags: Java

4 Comments

Newer posts Older posts

Java: Moving conditions into Message files

The standard approach

Using ChoiceFormat

Getting started with Spring Data Solr

Configuration

Creating a document

Creating a repository

Adding a document to Solr

Adding pagination and boosting

Faceting

Highlighting

Conclusion

Six things I learned for software localization

Groovy's magical NullObject

Two things to remember when using Java RMI