mscharhag, Programming and Stuff;

A blog about programming and software development topics, mostly focused on Java technologies including Java EE, Spring and Grails.

  • Tuesday, 15 September, 2020

    Implementing the Proxy Pattern in Java

    The Proxy Pattern

    Proxy is a common software design pattern. Wikipedia does a good job describing it like this:

    [..] In short, a proxy is a wrapper or agent object that is being called by the client to access the real serving object behind the scenes. Use of the proxy can simply be forwarding to the real object, or can provide additional logic. [..]

    (Wikipedia)

    UML class diagram:

    proxy pattern

    A client requires a Subject (typically an interface). This subject is implemented by a real implementation (here: RealSubject). A proxy implements the same interface and delegates operations to the real subject while adding its own functionality.

    In the next sections we will see how this pattern can be implemented in Java.

    Creating a simple proxy

    We start with an interface UserProvider (the Subject in the above diagram):

    public interface UserProvider {
        User getUser(int id);
    }

    This interface is implemented by UserProviderImpl (the real implementation):

    public class UserProviderImpl implements UserProvider {
        @Override
        public User getUser(int id) {
            return ...
        }
    }

    UserProvider is used by UsefulService (the client):

    public class UsefulService {
        private final UserProvider userProvider;
    
        public UsefulService(UserProvider userProvider) {
            this.userProvider = userProvider;
        }
        
        // useful methods
    }

    To initialize a UsefulService instance we just have to pass a UserProvider object to the constructor:

    UserProvider userProvider = new DatabaseUserProvider();
    UsefulService service = new UsefulService(userProvider);
    
    // use service

    Now let's add a Proxy object for UserProvider that does some simple logging:

    public class LoggingUserProviderProxy implements UserProvider {
        private final UserProvider userProvider;
    
        public LoggingUserProviderProxy(UserProvider userProvider) {
            this.userProvider = userProvider;
        }
    
        @Override
        public User getUser(int id) {
            System.out.println("Retrieving user with id " + id);
            return userProvider.getUser(id);
        }
    }

    We want to create a proxy for UserProvider, so our proxy needs to implement UserProvider. Within the constructor we accept the real UserProvider implementation. In the getUser(..) method we first write a message to standard out before we delegate the method call to the real implementation.

    To use our Proxy we have to update our initialization code:

    UserProvider userProvider = new UserProviderImpl();
    LoggingUserProviderProxy loggingProxy = new LoggingUserProviderProxy(userProvider);
    UsefulService usefulService = new UsefulService(loggingProxy);
    
    // use service

    Now, whenever UsefulService uses the getUser() method we will see a console message before a User object is returned from UserProviderImpl. With the Proxy pattern we were able to add logging without modifying the client (UsefulService) and the real implementation (UserProviderImpl).

    The problem with manual proxy creation

    The previous solution has a major downside: Our Proxy implementation is bound to the UserProvider interfaces and therefore hard to reuse.

    Proxy logic is often quite generic. Typical use-cases for proxies include caching, access to remote objects or lazy loading.

    However, a proxy needs to implement a specific interface (and its methods). This contradicts with re-usability.

    Solution: JDK Dynamic Proxies

    The JDK provides a standard solution to this problem, called Dynamic Proxies. Dynamic Proxies let us create a implementation for a specific interface at runtime. Method calls on this generated proxy are delegated to an InvocationHandler.

    With Dynamic Proxies the proxy creation looks like this:

    UserProvider userProvider = new DatabaseUserProvider();
    UserProvider proxy = (UserProvider) Proxy.newProxyInstance(
            UserProvider.class.getClassLoader(),
            new Class[]{ UserProvider.class },
            new LoggingInvocationHandler(userProvider)
    );
    UsefulService usefulService = new UsefulService(proxy);

    With Proxy.newProxyInstance(..) we create a new proxy object. This method takes three arguments:

    • The classloader that should be used
    • A list of interfaces that the proxy should implement (here UserProvider)
    • A InvocationHandler implementation

    InvocationHandler is an interface with a single method: invoke(..). This method is called whenever a method on the proxy object is called.

    Our simple LoggingInvocationHandler looks like this:

    public class LoggingInvocationHandler implements InvocationHandler {
    
        private final Object invocationTarget;
    
        public LoggingInvocationHandler(Object invocationTarget) {
            this.invocationTarget = invocationTarget;
        }
    
        @Override
        public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
            System.out.println(String.format("Calling method %s with args: %s",
                    method.getName(), Arrays.toString(args)));
            return method.invoke(invocationTarget, args);
        }
    }

    The invoke(..) method has three parameters:

    • The proxy object on which a method has been called
    • The method that has been called
    • A list of arguments that has been passed to the called method

    We first log the method and the arguments to stdout. Next we delegate the method call to the object that has been passed in the constructor (note we passed the real implementation in the previous snippet).

    The separation of proxy creation (and interface implementation) and proxy logic (via InvocationHandler) supports re-usability. Note we do not have any dependency to the UserProvider interface in our InvocationHandler implementation. In the constructor we accept a generic Object. This gives us the option to reuse the InvocationHandler implementation for different interfaces.

    Limitations of Dynamic Proxies

    Dynamic Proxies always require an interface. We cannot create proxies based on (abstract) classes.

    If this really a great issue for you can look into the byte code manipulation library cglib. cglib is able to create proxy via subclassing and therefore is able to create proxies for classes without requiring an interface.

    Conclusion

    The Proxy Pattern can be quite powerful. It allows us to add functionality without modifying the real implementation or the client.

    Proxies are often used to add some generic functionality to existing classes. Examples include caching, access to remote objects, transaction management or lazy loading.

    With Dynamic Proxies we can separate proxy creation from proxy implementation. Proxy method calls are delegated to an InvocationHandler which can be re-used.

    Note that in some situations the Proxy Pattern can be quite similar to the Decorator pattern (see this Stackoverflow discussion).

     

  • Wednesday, 9 September, 2020

    Quick tip: Referencing other Properties in Spring

    In Spring property (or yaml) files we can reference other properties using the ${..} syntax.

    For example:

    external.host=https://api.external.com
    external.productService=${external.host}/product-service
    external.orderService=${external.host}/order-service
    

    If we now access the external.productService property (e.g. by using the @Value annotation) we will get the value https://api.external.com/product-service.

    For example:

    @Value("${external.productService}")
    private String productServiceUrl; // https://api.external.com/product-service
    

    This way we can avoid duplication of commonly used values in property and yaml files.

  • Wednesday, 2 September, 2020

    REST: Dealing with Pagination

    In a previous post we learned how to retrieve resource collections. When those collections become larger, it is often useful to provide a way for clients to retrieve partial collections.

    Assume we provide an REST API for painting data. Our database might contain thousands of paintings. However, a web interface showing these paintings to users might only be able to show ten paintings at the same time. To view the next paintings the user needs to navigate to the next page which shows the following ten paintings. This process of dividing the content into smaller consumable sections (pages) is called Pagination.

    Pagination can be an essential part of your API if you are dealing with large collections.

    In the following sections we will look at different types of pagination

    Using page and size parameters

    The page parameter tells which page should be returned while size indicates how many elements a page should contain.

    For example, this might return the first page, containing 10 painting resources.

    GET /paintings?page=1&size=10

    To get the next page we simply increase the page parameter by one.

    Unfortunately it is not always clear if pages start counting with 0 or 1, so make sure to document this properly.

    (In my opinion 1 should be preferred because this represents the natural page counting)

    A minor issue with this approach might be that the client cannot change the size parameter for a specific page.

    For example, after getting the first 10 items of a collection by issuing

    GET /paintings?page=1&size=10

    we cannot get the second page with a size of 15 by requesting:

    GET /paintings?page=2&size=15

    This will return the items 15-30 of the collection. So, we missed 5 items (10-14).

    Using offset and limit parameters

    Another, but very similar approach is the use of offset and limit parameters. offset tells the server the number of items that should be skipped, while limit indicates the number of items to be returned.

    For example, this might return the first 10 painting resources:

    GET /paintings?offset=0&limit=10

    An offset parameter of 0 means that no elements should be skipped.

    We can get the following 10 resources by skipping the first 10 resources (= setting the offset to 10):

    GET /paintings?offset=10&limit=10

    This approach is a bit more flexible because offset and limit do not effect each other. So we can increase the limit for a specific page. We just need to make sure to adjust the offset parameter for the next page request accordingly.

    For example, this can be useful if a client displays data using a infinite scrollable list. If the user scrolls faster the client might request a larger chunk of resources with the next request.

    The downsides?

    Both previous solutions can work fine. They are often very easy to implement. However, both share two downsides.

    Depending on the underlying database and data structure you might run into performance problems for large offsets / page numbers. This is often an issue for relational databases (see this Stackoverflow questions for MySQL or this one for Postgres).

    Another problem is resource skipping caused by delete operations. Assume we request the first page by issuing:

    GET /paintings?page=1&size=10

    After we retrieved the response, someone deletes a resource that is located on the first page. Now we request the second page with:

    GET /paintings?page=2&size=10

    We now skipped one resource. Due to the deletion of a resource on the first page, all other resources in the collection move one position forward. The first resource of page two has moved to page one. 

    Seek Pagination

    An approach to solve those downsides is called Seek Pagination. Here, we use resource identifiers to indicate the collection offset.

    For example, this might return the first five resources:

    GET /paintings?limit=5

    Response:

    [
        { "id" : 2, ... },
        { "id" : 3, ... },
        { "id" : 5, ... },
        { "id" : 8, ... },
        { "id" : 9, ... }
    ]

    To get the next five resources, we pass the id of the last resource we received:

    GET /paintings?last_id=9&limit=5

    Response:

    [
        { "id" : 10, ... },
        { "id" : 11, ... },
        { "id" : 13, ... },
        { "id" : 14, ... },
        { "id" : 17, ... }
    ]

    This way we can make sure we do not accidentally skip a resource.

    For a relational database this is now much simpler. It is very likely that we just have to compare the primary key to the last_id parameter. The resulting query probably looks similar to this:

    select * from painting where id > last_id order by id limit 5;

    Response format

    When using JSON, partial results should be returned as JSON object (instead of an JSON array). Beside the collection items the total number of items should be included.

    Example response:

    {
        "total": 4321,
        "items": [
            {
                "id": 1,
                "name": "Mona Lisa",
                "artist": "Leonardo da Vinci"
            }, {
                "id": 2
                "name": "The Starry Night",
                "artist": "Vincent van Gogh"
            }
        ]
    }
    

    When using page and size parameters it is also a good idea to return the total number of available pages.

    Hypermedia controls

    If you are using Hypermedia controls in your API you should also add links for first, last, next and previous pages. This helps decoupling the client from your pagination logic.

    For example:

    GET /paintings?offset=0&limit=10
    {
        "total": 4317,
        "items": [
            {
                "id": 1,
                "name": "Mona Lisa",
                "artist": "Leonardo da Vinci"
            }, {
                "id": 2
                "name": "The Starry Night",
                "artist": "Vincent van Gogh"
            },
            ...
        ],
        "links": [
            { "rel": "self", "href": "/paintings?offset=0&limit=10" },
            { "rel": "next", "href": "/paintings?offset=10&limit=10" },
            { "rel": "last", "href": "/paintings?offset=4310&limit=10" },
            { "rel": "by-offset", "href": "/paintings?offset={offset}&limit=10" }
        ]
    }

    Note that we requested the first page. Therefore the first and previous links are missing. The by-offset link uses an URI-Template, so the client choose an arbitrary offset.

    Range headers and HTTP status 206 (partial content)

    So far we passed pagination options as request parameters. However, we can also follow an alternative approach using Range and Content-Range headers.

    In the next example request the client uses the Range-header to request the first 10 paintings:

    GET /paintings
    Range: items=0-10

    The Range header is used to request only specific parts of the resource and requires the following format:

    Range: <unit>=<range-start>-<range-end>

    With:

    • <unit> - The unit in which the range is specified. Often bytes is used. However, for APIs we can also use something like items.
    • <range-start> - Start of the requested range
    • <range-end> - End of the requested range

    The server responds to this request with HTTP status 206 (Partial Content) which requires a Content-Range header:

    HTTP/1.1 206 Partial Content
    Content-Range: items 0-12/34
    
    [
    	... first 10 items
    ]

    Within the Content-Range header the server communicates the offsets of the returned items and the total amount of items. The required format of Content-Range looks like this:

    Content-Range: <unit> <range-start>-<range-end>/<size>

    With:

    • <unit> - The unit of the returned range
    • <range-start> - Beginning of the range
    • <range-end> - End of the range
    • <size> - Total size of the range. Can be * if the size is not known

    While this approach can work fine, it is usually easier to work with query parameters than parsing Range and Content-Range headers. It is also not possible to provide hypermedia pagination links if we communicate pagination offsets within headers.

     

    Interested in more REST related articles? Have a look at my REST API design page.

  • Thursday, 27 August, 2020

    REST: Retrieving resources

    Retrieving resources is probably the simplest REST API operation. It is implemented by sending a GET request to an appropriate resource URI. Note that GET is a safe HTTP method, so a GET request is not allowed to change resource state. The response format is determined by Content-Negotiation.

    Retrieving collection resources

    Collections are retrieved by sending a GET request to a resource collection.

    For example, a GET request to /paintings might return a collection of painting resources:

    Request:

    GET /paintings
    Accept: application/json
    

    Response:

    HTTP/1.1 200 (Ok)
    Content-Type: application/json
    
    [
        {
            "id": 1,
            "name": "Mona Lisa"
        }, {
            "id": 2
            "name": "The Starry Night"
        }
    ]
    

    The server indicates a successful response using the HTTP 200 status code (see: Common HTTP status codes).

    Note that it can be a good idea to use a JSON object instead of an array as root element. This allows additional collection information and Hypermedia links besides actual collection items.

    Example response:

    HTTP/1.1 200 (Ok)
    Content-Type: application/json
    
    {
        "total": 2,
        "lastUpdated": "2020-01-15T10:30:00",
        "items": [
            {
                "id": 1,
                "name": "Mona Lisa"
            }, {
                "id": 2
                "name": "The Starry Night"
            }
        ],
        "_links": [
            { "rel": "self", "href": "/paintings" }
        ]
    }
    

    If the collection is empty the server should respond with HTTP 200 and an empty collection (instead of returning an error).

    For example:

    HTTP/1.1 200 (Ok)
    Content-Type: application/json
    
    {
        "total": 0,
        "lastUpdated": "2020-01-15T10:30:00",
        "items": [],
        "_links": [
            { "rel": "self", "href": "/paintings" }
        ]
    }
    

    Resource collections are often top level resources without an id (like /products or /paintings) but can also be sub-resources. For example, /artists/42/paintings might represent the collection of painting resources for the artist with id 42.

    Retrieving single resources

    Single resources retrieved in the same way as collections. If the resource is part of a collection it is typically identified by the collection URI plus the resource id.

    For example, a GET request to /paintings/1 might return the painting with id 1:

    Request:

    GET /paintings/1
    Accept: application/json
    

    Response:

    HTTP/1.1 200 (Ok)
    Content-Type: application/json
    Last-Modified: Sat, 16 Feb 2021 12:34:56 GMT
    
    {
        "id": 1,
        "name": "Mona Lisa",
        "artist": "Leonardo da Vinci"
    }
    

    If no resource for the given id is available, HTTP 404 (Not found) should be returned.

    The Last-Modified header

    The previous example also includes a Last-Modified header which tells when the resource has been last modified.

    This gives the option for conditional requests using the If-Modified-Since and If-Unmodified-Since headers. The If-Modified-Since header helsp to support caching while If-Unmodified-Since can be used to avoid the lost-update problem with concurrent resource updates.

    According to RFC 7232 (conditional requests):

    An origin server SHOULD send Last-Modified for any selected representation for which a last modification date can be reasonably and consistently determined, since its use in conditional requests and evaluating cache freshness (RFC7234) results in a substantial reduction of HTTP traffic on the Internet and can be a significant factor in improving service scalability and reliability.

    Interested in more REST related articles? Have a look at my REST API design page.

  • Monday, 24 August, 2020

    OCR in Java with Tess4J

    Optical character recognition (OCR) is the conversion of images containing text to machine-encoded text. A popular tool for this is the open source project Tesseract. Tesseract can be used as standalone application from the command line. Alternatively it can be integrated into applications using its C++ API. For other programming languages various wrapper APIs are available. In this post we will use the Java Wrapper Tess4J.

    Getting started

    We start with adding the Tess4J maven dependency to our project:

    <dependency>
        <groupId>net.sourceforge.tess4j</groupId>
        <artifactId>tess4j</artifactId>
        <version>4.5.2</version>
    </dependency>

    Next we need to make sure the native libraries required by Tess4j are accessible from our application. Tess4J jar files ship with native libraries included. However, they need to be extracted before they can be loaded. We can do this programmatically using a Tess4J utility method:

    File tmpFolder = LoadLibs.extractTessResources("win32-x86-64");
    System.setProperty("java.library.path", tmpFolder.getPath());

    With LoadLibs.extractTessResources(..) we can extract resources from the jar file to a local temp directory. Note that the argument (here win32-x86-64) depends on the system you are using. You can see available options by looking into the Tess4J jar file. We can instruct Java to load native libraries from the temp directory by setting the Java system property java.library.path.

    Other options to provide the libraries might be installing Tesseract on your system. If you do not want to change the java.library.path property you can also manually load the libraries using System.load(..).

    Next we need to provide language dependent data files to Tesseract. These data files contain trained models for Tesseracts LSTM OCR engine and can be downloaded from GitHub. For example, for detecting german text we have to download deu.traineddata (deu is the ISO 3166-1-alpha-3 country code for Germany). We place one or more downloaded data files in the resources/data directory. 

    Detecting Text

    Now we are ready to use Tesseract within our Java application. The following snippet shows a minimal example:

    Tesseract tesseract = new Tesseract();
    tesseract.setLanguage("deu");
    tesseract.setOcrEngineMode(1);
    
    Path dataDirectory = Paths.get(ClassLoader.getSystemResource("data").toURI());
    tesseract.setDatapath(dataDirectory.toString());
    
    BufferedImage image = ImageIO.read(Main.class.getResourceAsStream("/ocrexample.jpg"));
    String result = tesseract.doOCR(image);
    System.out.println(result);

    First we create a new Tesseract instance. We set the language we want to recognize (here: german). With setOcrEngineMode(1) we tell Tesseract to use the LSTM OCR engine.

    Next we set the data directory with setDatapath(..) to the directory containing our downloaded LSTM models (here: resources/data).

    Finally we load an example image from the classpath and use the doOCR(..) method to perform character recognition. As a result we get a String containing detected characters.

    For example, feeding Tesseract with this photo from the German wikipedia OCR article might produce the following text output.

    ocr-example

    Text output:

    Grundsätzliches [Quelltext bearbeiten]
    Texterkennung ist deshalb notwendig, weil optische Eingabegeräte (Scanner oder Digitalkameras, aber
    auch Faxempfänger) als Ergebnis ausschließlich Rastergrafiken liefern können. d. h. in Zeiten und Spaten
    angeordnete Punkte unterschiedlicher Färbung (Pixel). Texterkennung bezeichnet dabei die Aufgabe, die so
    dargestellten Buchstaben als solche zu erkennen, dh. zu identifizieren und ihnen den Zahlenwert
    zuzuordnen, der ihnen nach üblicher Textcodierung zukommt (ASCII, Unicode). Automatische Texterkennung
    und OCR werden im deutschen Sprachraum oft als Synonym verwendet In technischer Hinsicht bezieht sich
    OCR jedoch nur auf den Teilbereich der Muster vergleiche von separierten Bildteilen als Kandidaten zur
    ( Erkennung von Einzelzeichen. Diesem OCR—Prozess geht eine globale Strukturerkennung voraus, in der
    zuerst Textblöcke von graphischen Elementen unterschieden, die Zeilenstrukturen erkannt und schließlich
    | Einzeizeichen separiert werden. Bei der Entscheidung, welches Zeichen vorliegt, kann über weitere
    \ . Algorithmen ein sprachlicher Kontext berücksichtigt werden
    

    Summary

    Tesseract is a popular open source project for OCR. With Tess4J we can access the Tesseract API in Java. A little bit of set up is required for loading native libraries and downloading Tesseracts LSTM data. After that it is quite easy to perform OCR in Java. If you are not happy with the recognized text it is a good idea to have a look at the Improving the quality of the output section of the Tesseract documentation.

    You can find the source code for the shown example on GitHub.