mscharhag, Programming and Stuff;

A blog about programming and software development topics, mostly focused on Java technologies including Java EE, Spring and Grails.

Saturday, 23 November, 2013

Getting started with Spring Data Solr

Spring Data Solr is an extension to the Spring Data project which aims to simplify the usage of Apache Solr in Spring applications. Please note that this is not an introduction into Spring (Data) or Solr. I assume you have at least some basic understanding of both technologies. Within the following post I will show how you can use Spring Data repositories to access Solr features in Spring applications.

Configuration

First we need a running Solr server. For simplicity reasons we will use the example configuration that comes with the current Solr release (4.5.1 at the time I am writing) and is described in the official Solr tutorial. So we only have to download Solr, extract it in a directory of our choice and then run java -jar start.jar from the <solr home>/example directory.

Now let's move to our demo application and add the Spring Data Solr dependency using maven:

<dependency>
  <groupId>org.springframework.data</groupId>
  <artifactId>spring-data-solr</artifactId>
  <version>1.0.0.RELEASE</version>
</dependency>

In this example I am using Spring Boot to set up a small example Spring application. I am using the following Spring Boot dependencies and the Spring Boot parent pom for this:

<parent>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-parent</artifactId>
  <version>0.5.0.BUILD-SNAPSHOT</version>
</parent>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-test</artifactId>
  <scope>test</scope>
</dependency>

Don't worry if you haven't used Spring Boot yet. These dependencies mainly act as shortcut for common (Spring) dependencies and simplify the configuration a bit. If you want to integrate Spring Data Solr within an existing Spring application you can skip the Spring Boot dependencies.

The Spring bean configuration is quite simple, we only have to define two beans ourself:

@ComponentScan
@EnableSolrRepositories("com.mscharhag.solr.repository")
public class Application {

  @Bean
  public SolrServer solrServer() {
    return new HttpSolrServer("http://localhost:8983/solr");
  }

  @Bean
  public SolrTemplate solrTemplate(SolrServer server) throws Exception {
    return new SolrTemplate(server);
  }
}

The solrServer bean is used to connect to the running Solr instance. Since Spring Data Solr uses Solrj we create a Solrj HttpSolrServer instance. It would also be possible to use an embedded Solr server by using EmbeddedSolrServer. The SolrTemplate provides common functionality to work with Solr (similar to Spring's JdbcTemplate). A solrTemplate bean is required for creating Solr repositories. Please also note the @EnableSolrRepositories annotation. With this annotation we tell Spring Data Solr to look in the specified package for Solr repositories.

Creating a document

Before we can query Solr we have to add documents to the index. To define a document we create a POJO and add Solrj annotations to it. In this example we will use a simple Book class as document:

public class Book {

  @Field
  private String id;
  
  @Field
  private String name;
  
  @Field
  private String description;
  
  @Field("categories_txt")
  private List<Category> categories;
  
  // getters/setters
}
public enum Category {
  EDUCATION, HISTORY, HUMOR, TECHNOLOGY, ROMANCE, ADVENTURE
}

Each book has a unique id, a name, a description and belongs to one or more categories. Note that Solr requires a unique ID of type String for each document by default. Fields that should be added to the Solr index are annotated with the Solrj @Field annotation. By default Solrj tries to map document field names to Solr fields of the same name. The Solr example configuration already defines Solr fields named id, name and description so it should not be necessary to add these fields to the Solr configuration.

In case you want to change the Solr field definitions you can find the example configuration file at <solr home>/example/solr/collection1/conf/schema.xml. Within this file you should find the following field definitions:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
<field name="name" type="text_general" indexed="true" stored="true" />
<field name="description" type="text_general" indexed="true" stored="true"/>

In general title would be a better attribute name for a Book than name. However, by using name we can use the default Solr field configuration. So I go for name instead of title for simplicity reasons.

For categories we have to define the field name manually using the @Field annotation: categories_txt. This matches the dynamic field named *_txt from the Solr example. This field definition can also be found in schema.xml:

<dynamicField name="*_txt" type="text_general"   indexed="true"  stored="true" multiValued="true"/>

Creating a repository

Spring Data uses repositories to simplify the usage of various data access technologies. A repository is basically an interface whose implementation is dynamically generated by Spring Data on application start. The generated implementation is based on naming conventions used in the repository interface. If this is new to you I recommend reading Working with Spring Data Repositories.

Spring Data Solr uses the same approach. We use naming conventions and annotations inside interfaces to define the methods we need to access Solr features. We start with a simple repository that contains only one method (we will add more later):

public interface BookRepository extends SolrCrudRepository<Book, String> {

  List<Book> findByName(String name);

}

We get some common methods like save(), findAll(), delete() or count() in the repository by extending SolrCrudRepository. With the definition of the interface method findByName(String name) we tell Spring Data Solr to create a method implementation that queries Solr for a list of books. The book names in this list should match the passed parameter.

The repository implementation can be injected into other classes using Spring's DI functionality. In this example we inject the repository into a simple JUnit test:

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = Application.class, loader=SpringApplicationContextLoader.class)
public class BookRepositoryTests {
  
  @Autowired
  private BookRepository bookRepository;
  
  ...
}

Adding a document to Solr

Now it is time to add some books to Solr. Using our repository this is a very easy job:

private void addBookToIndex(String name, String description, Category... categories) {
  Book book = new Book();
  book.setName(name);
  book.setDescription(description);
  book.setCategories(Arrays.asList(categories));
  book.setId(UUID.randomUUID().toString());
  bookRepository.save(book);
}

private void createSampleData() {
  addBookToIndex("Treasure Island", "Best seller by R.L.S.", Category.ADVENTURE);
  addBookToIndex("The Pirate Island", "Oh noes, the pirates are coming!", Category.ADVENTURE, Category.HUMOR);
  ...
}

Adding pagination and boosting

Assume we have an application where users are able to search for books. We need to find books whose name or description match the search query given by the user. For performance reasons we want to add some kind of pagination which shows only 10 search results at once to the user.

Let's create a new method in our repository interface for this:

Page<Book> findByNameOrDescription(@Boost(2) String name, String description, Pageable pageable);

The method name findByNameOrDescription tells Spring Data Solr to query for book objects whose name or description match the passed parameters. To support pagination we added the Pageable parameter and changed the return type from List<Book> to Page<Book>. By adding the @Boost annotation to the name parameter we are boosting books whose name matches the search parameter. This makes sense because those books are typically at higher Interest for the user.

If we now want to query for the first page containing 10 elements we just have to do:

Page<Book> booksPage = bookRepository.findByNameOrDescription(searchString, searchString, new PageRequest(0, 10));

Besides the first 10 search results Page<Book> provides some useful methods for building pagination functionality:

booksPage.getContent()       // get a list of (max) 10 books
booksPage.getTotalElements() // total number of elements (can be >10)
booksPage.getTotalPages()    // total number of pages
booksPage.getNumber()        // current page number
booksPage.isFirstPage()      // true if this is the first page
booksPage.hasNextPage()      // true if another page is available
booksPage.nextPageable()     // the pageable for requesting the next page
...

Faceting

Whenever a user searches for a book name we want to show him how many books matching the given query parameter are available in the different categories. This feature is called faceted search and directly supported by Spring Data Solr. We just have to add another method to our repository interface:

@Query("name:?0")
@Facet(fields = { "categories_txt" }, limit = 5)
FacetPage<Book> findByNameAndFacetOnCategories(String name, Pageable page);

This time the query will be derived from the @Query annotation (containing the Solr query) instead of the method name. With the @Facet annotation we tell Spring Data Solr to facet books by categories and return the first five facets.

It would also be possible to remove the @Query annotation and change the method name to findByName for the same effect. The small disadvantage in this approach is that it is not obvious to the caller that this repository method does perform facetting. Addionally the method signature might collide with other methods that search books by name.

Usage:

FacetPage<Book> booksFacetPage = bookRepository.findByNameAndFacetOnCategories(bookName, new PageRequest(0, 10));

booksFacetPage.getContent(); // the first 10 books

for (Page<? extends FacetEntry> page : booksFacetPage.getAllFacets()) {
  for (FacetEntry facetEntry : page.getContent()) {
    String categoryName = facetEntry.getValue();  // name of the category
    long count = facetEntry.getValueCount();      // number of books in this category
    
    // convert the category name back to an enum
    Category category = Category.valueOf(categoryName.toUpperCase());
  }
}

Note that booksFacetPage.getAllFacets() returns a Collection of FacetEntry pages. This is because the @Facet annotation allows you to facet multiple fields at once. Each FacetPage contains max. five FacetEntries (defined by the limit attribute of @Facet).

Highlighting

Often it is useful to highlight the search query occurrences in the list of search results (like it is done by google or bing). This can be achieved with the highlighting feature of (Spring Data) Solr.

Let's add another repository method:

@Highlight(prefix = "<highlight>", postfix = "</highlight>")
HighlightPage<Book> findByDescription(String description, Pageable pageable);

The @Highlight annotation tells Solr to highlight to occurrences of the searched description.

Usage:

HighlightPage<Book> booksHighlightPage = bookRepository.findByDescription(description, new PageRequest(0, 10));

booksHighlightPage.getContent(); // first 10 books

for (HighlightEntry<Book> he : booksHighlightPage.getHighlighted()) {
  // A HighlightEntry belongs to an Entity (Book) and may have multiple highlighted fields (description)
  for (Highlight highlight : he.getHighlights()) {
    // Each highlight might have multiple occurrences within the description
    for (String snipplet : highlight.getSnipplets()) {
      // snipplet contains the highlighted text
    }
  }
}

If we use this repository method to query for books whose description contains the string Treasure Island a snipplet might look like this:

<highlight>Treasure Island</highlight> is a tale of pirates and villains, maps, treasure and shipwreck, and is perhaps one of the best adventure story ever written.

In this case Treasure Island is located at the beginning of the description and is highlighted with the prefix and postfix defined in the @Highlight annotation. This additional markup can be used to mark query occurrences when the search results are shown to the user.

Conclusion

Spring Data Solr provides a very simple way to integrate Solr into Spring applications. With the repository abstraction it follows the same design principle most other Spring Data project do. The only small drawback I faced while playing around with Spring Data Solr was the documentation that could be improved here and there.

You can find the complete source code for this example on GitHub.

Comments

  • edgar - Wednesday, 26 March, 2014

    great - worked out of the box. I added a few more log statements to see results but that was all. Many thanks.

  • Kupster - Friday, 10 October, 2014

    Where's bean for:
    @Autowired
    private BookRepository bookRepository;
    ?
    It's not working here:
    "No qualifying bean of type [xx.xxx.xxx.OffersRepository] found for dependency"

  • Michael Scharhag - Friday, 10 October, 2014

    Hi Kupster,

    BookRepository is an interface (see: https://github.com/mscharhag/Spring-data-solr-example/blob/master/src/main/java/com/mscharhag/solr/repository/BookRepository.java).

    The implemention for this interface is generated by Spring data solr.

  • Kupster - Monday, 13 October, 2014

    Ok, now it's working but i dont know why, thanks for fast answer : )

  • Kumar Kumar - Monday, 5 January, 2015

    how to change the collection name. by default its taking 'collection1' and gives below messages

    Not adding test data to solr index. Data already exists
    Not adding test data to solr index. Data already exists
    Not adding test data to solr index. Data already exists
    Not adding test data to solr index. Data already exists

  • jb62 - Monday, 14 September, 2015

    Surely a matching SOLR core has to be set up first? No indication of how to do that? Downloaded the project from Git and all tests fail, of course. What am I missing?

  • Michael Scharhag - Saturday, 26 September, 2015

    Hi jb62,
    as described in the "Configuration" section, you need to setup a local solr instance first.

  • jb62 - Tuesday, 6 October, 2015

    Thank you Michael - I was running into the breaking changes with SOLR 5... I've since gotten a 4.x version of SOLR to work with.

    I'm having the same problem with the interface that Kupster did, but your explanation helps me understand what is supposed to happen anyway...

  • jb62 - Wednesday, 7 October, 2015

    Just a big thanks Michael - this project and it's documentation (unlike the one on Spring's site) was totally understandable and straightforward. Thanks for the clear documentation! I've got a gig building SOLR search from scratch for a particular company and had been out of the SOLR loop for a while, this really helped me get the ball rolling. Oh, and as with Kupster, that problem finally went away - although I ended up blowing the project away and re-downloading yours from git.

  • Ramya - Tuesday, 13 December, 2016

    Where do I give bean definition if I want to create my own JobsRepository.I created JobsRepositoryTests and when I try to instantiate JobsRepository,I get error "Error creating bean with name 'com.apple.solr.repository.JobRepositoryTests': Injection of resource dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'jobPositionRepository': FactoryBean threw exception on object creation; nested exception is org.springframework.data.mapping.PropertyReferenceException: No property description found for type com.apple.solr.document.JobPosition"

Leave a reply