mscharhag, Programming and Stuff;

A blog about programming and software development topics, mostly focused on Java technologies including Java EE, Spring and Grails.

  • Wednesday, 28 July, 2021

    File down- and uploads in RESTful web services

    Usually we use standard data exchange formats like JSON or XML with REST web services. However, many REST services have at least some operations that can be hard to fulfill with just JSON or XML. Examples are uploads of product images, data imports using uploaded CSV files or generation of downloadable PDF reports.

    In this post we focus on those operations, which are often categorized as file down- and uploads. This is a bit flaky as sending a simple JSON document can also be seen as a (JSON) file upload operation.

    Think about the operation you want to express

    A common mistake is to focus on the specific file format that is required for the operation. Instead, we should think about the operation we want to express. The file format just decides the Media Type used for the operation.

    For example, assume we want to design an API that let users upload an avatar image to their user account.

    Here, it is usually a good idea to separate the avatar image from the user account resource for various reasons:

    • The avatar image is unlikely to change so it might be a good candidate for caching. On the other, hand the user account resource might contain things like the last login date which changes frequently.
    • Not all clients accessing the user account might be interested in the avatar image. So, bandwidth can be saved.
    • For clients it is often preferable to load images separately (think of web applications using <img> tags)

    The user account resource might be accessible via:

    /users/<user-id>

    We can come up with a simple sub-resource representing the avatar image:

    /users/<user-id>/avatar

    Uploading an avatar is a simple replace operation which can be expressed via PUT:

    PUT /users/<user-id>/avatar
    Content-Type: image/jpeg
    
    <image data>
    

    In case a user wants to delete his avatar image, we can use a simple DELETE operation:

    DELETE /users/<user-id>/avatar
    

    And of course clients need a way to show to avatar image. So, we can provide a download operation with GET:

    GET /users/<user-id>/avatar
    

    which returns

    HTTP/1.1 200 Ok
    Content-Type: image/jpeg
    
    <image data>
    

    In this simple example we use a new sub-resource with common update, delete, get operations. The only difference is we use an image media type instead of JSON or XML.

    Let's look at a different example.

    Assume we provide an API to manage product data. We want to extend this API with an option to import products from an uploaded CSV file. Instead of thinking about file uploads we should think about a way to express a product import operation.

    Probably the simplest approach is to send a POST request to a separate resource:

    POST /product-import
    Content-Type: text/csv
    
    <csv data>
    

    Alternatively, we can also see this as a bulk operation for products. As we learned in another post about bulk operations with REST, the PATCH method is a possible way to express a bulk operation on a collection. In this case, the CSV document describes the desired changes to product collection.

    For example:

    PATCH /products
    Content-Type: text/csv
    
    action,id,name,price
    create,,Cool Gadget,3.99
    create,,Nice cap,9.50
    delete,42,,
    

    This example creates two new products and deletes the product with id 42.

    Processing file uploads can take a considerable amount of time. So think about designing it as an asynchronous REST operation.

    Mixing files and metadata

    In some situations we might need to attach additional metadata to a file. For example, assume we have an API where users can upload holiday photos. Besides the actual image data a photo might also contain a description, a location where it was taken and more.

    Here, I would (again) recommend using two separate operations for similar reasons as stated in the previous section with the avatar image. Even if the situation is a bit different here (the data is directly linked to the image) it is usually the simpler approach.

    In this case, we can first create a photo resource by sending the actual image:

    POST /photos
    Content-Type: image/jpeg
    
    <image data>

    As response we get:

    HTTP/1.1 201 Created
    Location: /photos/123

    After that, we can attach additional metadata to the photo:

    PUT /photos/123/metadata
    Content-Type: application/json
    
    {
        "description": "Nice shot of a beach in hawaii",
        "location": "hawaii",
        "filename": "hawaii-beach.jpg"
    }
    

    Of course we can also design it the other way around and send the metadata before the image.

    Embedding Base64 encoded files in JSON or XML

    In case splitting file content and metadata in seprate requests it not possible, we can embed files into JSON / XML documents using Base64 encoding. With Base64 encoding we can convert binary formats to a text representation which can be integrated in other text based formats, like JSON or XML.

    An example request might look like this:

    POST /photos
    Content-Type: application/json
    
    {
        "width": "1280",
        "height": "920",
        "filename": "funny-cat.jpg",
        "image": "TmljZSBleGFt...cGxlIHRleHQ="
    }

    Mixing media-types with multipart requests

    Another possible approach to transfer image data and metadata in a single request / response are multipart media types.

    Multipart media types require a boundary parameter that is used as delimiter between different body parts. The following request consists of two body parts. The first one contains the image while the second part contains the metadata.

    For example

    POST /photos
    Content-Type: multipart/mixed; boundary=foobar
    
    --foobar
    Content-Type: image/jpeg
    
    <image data>
    --foobar
    Content-Type: application/json
    
    {
        "width": "1280",
        "height": "920",
        "filename": "funny-cat.jpg"
    }
    --foobar--

    Unfortunately multipart requests / responses are often hard to work with. For example, not every REST client might be able to construct these requests and it can be hard to verify responses in unit tests.

    Interested in more REST related articles? Have a look at my REST API design page.

  • Sunday, 27 June, 2021

    Kotlin: Type conversion with adapters

    In this post we will learn how we can use Kotlin extension functions to provide a simple and elegant type conversion mechanism.

    Maybe you have used Apache Sling before. In this case, you are probably familiar with Slings usage of adapters. We will implement a very similar approach in Kotlin.

    Creating an extension function

    With Kotlins extension functions we can add methods to existing classes. The following declaration adds an adaptTo() method to all sub types of Any.

    inline fun <reified T : Any> Any.adaptTo(): T {
        ..
    }
    

    The generic parameter T parameter specifies the target type that should be returned by the method. We keep the method body empty for the moment.

    Converting an Object of type A to another object of type B will look like this with our new method:

    val a = A("foo")
    val b = a.adaptTo<B>()

    Providing conversion rules with adapters

    In order to implement the adaptTo() method we need a way to define conversion rules.

    We use a simple Adapter interface for this:

    interface Adapter {
        fun <T : Any> canAdapt(from: Any, to: KClass<T>): Boolean
        fun <T : Any> adaptTo(from: Any, to: KClass<T>): T
    }

    canAdapt(..) returns true when the implementing class is able to convert the from object to type to.

    adaptTo(..) performs the actual conversion and returns an object of type to.

    Searching for an appropriate adapter

    Our adaptTo() extension function needs a way to access available adapters. So, we create a simple list that stores our adapter implementations:

    val adapters = mutableListOf<Adapter>()
    

    Within the extension function we can now search the adapters list for a suitable adapter:

    inline fun <reified T : Any> Any.adaptTo(): T {
        val adapter = adapters.find { it.canAdapt(this, T::class) }
                ?: throw NoSuitableAdapterFoundException(this, T::class)
        return adapter.adaptTo(this, T::class)
    }
    
    class NoSuitableAdapterFoundException(from: Any, to: KClass<*>)
        : Exception("No suitable adapter found to convert $from to type $to")
    
    

    If an adapter is found that can be used for the requested conversion we call adaptTo(..) of the adapter and return the result. In case no suitable adapter is found a NoSuitableAdapterFoundException is thrown.

    Example usage

    Assume we want to convert JSON strings to Kotlin objects using the Jackson JSON library. A simple adapter might look like this:

    class JsonToObjectAdapter : Adapter {
        private val objectMapper = ObjectMapper().registerModule(KotlinModule())
    
        override fun <T : Any> canAdapt(from: Any, to: KClass<T>) = from is String
    
        override fun <T : Any> adaptTo(from: Any, to: KClass<T>): T {
            require(canAdapt(from, to))
            return objectMapper.readValue(from as String, to.java)
        }
    }

    Now we can use our new extension method to convert a JSON string to a Person object:

    data class Person(val name: String, val age: Int)
    
    fun main() {
        // register available adapter at application start
        adapters.add(JsonToObjectAdapter())
    
        ...
        
        // actual usage
        val json = """
            {
                "name": "John",
                "age" : 42
            }
        """.trimIndent()
    
        val person = json.adaptTo<Person>()
    }

    You can find the source code of the examples on GitHub.

    Within adapters.kt you find all the required pieces in case you want to try this on your own. In example-usage.kt you find some adapter implementations and usage examples.

  • Sunday, 13 June, 2021

    Making POST and PATCH requests idempotent

    In an earlier post about idempotency and safety of HTTP methods we learned that idempotency is a positive API feature. It helps making an API more fault-tolerant as a client can safely retry a request in case of connection problems.

    The HTTP specification defines GET, HEAD, OPTIONS, TRACE, PUT and DELETE methods as idempotent. From these methods GET, PUT and DELETE are the ones that are usually used in REST APIs. Implementing GET, PUT and DELETE in an idempotent way is typically not a big problem.

    POST and PATCH are a bit different, neither of them is specified as idempotent. However, both can be implemented with regard of idempotency making it easier for clients in case of problems. In this post we will explore different options to make POST and PATCH requests idempotent.

    Using a unique business constraint

    The simplest approach to provide idempotency when creating a new resource (usually expressed via POST) is a unique business constraint.

    For example, consider we want to create a user resource which requires a unique email address:

    POST /users
    
    {
        "name": "John Doe",
        "email": "john@doe.com"
    }

    If this request is accidentally sent twice by the client, the second request returns an error because a user with the given email address already exists. In this case, usually HTTP 400 (bad request) or HTTP 409 (conflict) is returned as status code.

    Note that the constraint used to provide idempotency does not have to be part of the request body. URI parts and relationship can also help forming a unique constraint.

    A good example for this is a resource that relates to a parent resource in a one-to-one relation. For example, assume we want to pay an order with a given order-id.

    The payment request might look like this:

    POST /order/<order-id>/payment
    
    {
        ... (payment details)
    }

    An order can only be paid once so /payment is in a one-to-one relation to its parent resource /order/<order-id>. If there is already a payment present for the given order, the server can reject any further payment attempts.

    Using ETags

    Entity tags (ETags) are a good approach to make update requests idempotent. ETags are generated by the server based on the current resource representation. The ETag is returned within the ETag header value. For example:

    Request

    GET /users/123

    Response

    HTTP/1.1 200 Ok
    ETag: "a915ecb02a9136f8cfc0c2c5b2129c4b"
    
    {
        "name": "John Doe",
        "email": "john@doe.com"
    }

    Now assume we want to use a JSON Merge Patch request to update the users name:

    PATCH /users/123
    If-Match: "a915ecb02a9136f8cfc0c2c5b2129c4b"
    
    {
        "name": "John Smith"
    }

    We use the If-Match condition to tell the server only to execute the request if the ETag matches. Updating the resource leads to an updated ETag on the server side. So, if the request is accidentally sent twice, the server rejects the second request because the ETag no longer matches. Usually HTTP 412 (precondition failed) should be returned in this case.

    I explained ETags a bit more detailed in my post about avoiding issues with concurrent updates.

    Obviously ETags can only be used if the resource already exists. So this solution cannot be used to ensure idempotency when a resource is created. On the good side this is a standardized and very well understood way.

    Using a separate idempotency key

    Yet another approach is to use a separate client generated key to provide idempotency. In this way the client generates a key and adds it to the request using a custom header (e.g. Idempotency-Key).

    For example, a request to create a new user might look like this:

    POST /users
    Idempotency-Key: 1063ef6e-267b-48fc-b874-dcf1e861a49d
    
    {
        "name": "John Doe",
        "email": "john@doe.com"
    }

    Now the server can persist the idempotency key and reject any further requests using the same key.

    There are two questions to think about with this approach:

    • How to deal with requests that have not been completed successfully (e.g. by returning HTTP 4xx or 5xx status codes)? Should the idempotency key be saved by the server in these cases? If so, clients always need to use a new idempotency key if they want to retry requests.
    • What to return if the server retrieves a request with an already known idempotency key.

    Personally I tend to save the idempotency key only if the request finished sucessfully. In the second case I would return HTTP 409 (conflict) to indicate that a request with the given idempotency key has already been executed.

    However, opinions can be different here. For example, the Stripe API makes use of an Idempotency-Key header. Stripe saves the idempotency key and the returned response in all cases. If a provided idempotency key is already present, the stored response gets returned without executing the operation again.

    The later can confuse the client in my opinion. On the other hand, it gives the client the option retrieve the response of a previously executed request again.

    Summary

    A simple unique business key can be used to provide idempotency for operations that create resources.

    For non-creating operations we can use server generated ETags combined with the If-Match header. This approach has the advantage of being standardized and widely known.

    As an alternative we can use a client generated idempotency key provided in a custom request header. The server saves those idempotency keys and rejects requests that contain an already used idempotency key. This approach can be used for all types of requests. However, it is not standardized and has some points to think about.

     

    Interested in more REST related articles? Have a look at my REST API design page.

  • Sunday, 30 May, 2021

    Providing useful API error messages with Spring Boot

    For API users it is quite important an API provides useful error messages. Otherwise, it can be hard to figure out why things do not work. Debugging what's wrong can quickly become a larger effort for the client than actually implementing useful error responses on the server side. This is especially true if clients are not able to solve the problem themself and additional communication is required.

    Nonetheless this topic is often ignored or implemented halfheartedly.

    Client and security perspectives

    There are different perspectives on error messages. Detailed error messages are more helpful for clients while, from a security perspective, it is preferable to expose as little information as possible. Luckily those two views often do not conflict that much, when implemented correctly.

    Clients are usually interested in very specific error messages if the error is produced by them. This should usually be indicated by a 4xx status code. Here, we need specific messages that point to the mistake made by the client without exposing any internal implementation detail.

    On the other hand, if the client request is valid and the error is produced by the server (5xx status codes), we should be conservative with error messages. In this case, the client is not able to solve the problem and therefore does not require any details about the error.

    A response indicating an error should contain at least two things: A human readable message and an error code. The first one helps the developer that sees the error message in the log file. The later allows specfic error processing on the client (e.g. showing a specific error message to the user).

    How to build a useful error response in a Spring Boot application?

    Assume we have a small application in which we can publish articles. A simple Spring controller to do this might look like this:

    @RestController
    public class ArticleController {
    
        @Autowired
        private ArticleService articleService;
    
        @PostMapping("/articles/{id}/publish")
        public void publishArticle(@PathVariable ArticleId id) {
            articleService.publishArticle(id);
        }
    }

    Nothing special here, the controller just delegates the operation to a service, which looks like this:

    @Service
    public class ArticleService {
    
        @Autowired
        private ArticleRepository articleRepository;
    
        public void publishArticle(ArticleId id) {
            Article article = articleRepository.findById(id)
                    .orElseThrow(() -> new ArticleNotFoundException(id));
    
            if (!article.isApproved()) {
                throw new ArticleNotApprovedException(article);
            }
    
            ...
        }
    }

    Inside the service we throw specific exceptions for possible client errors. Note that those exception do not just describe the error. They also carry information that might help us later to produce a good error message:

    public class ArticleNotFoundException extends RuntimeException {
        private final ArticleId articleId;
    
        public ArticleNotFoundException(ArticleId articleId) {
            super(String.format("No article with id %s found", articleId));
            this.articleId = articleId;
        }
        
        // getter
    }

    If the exception is specific enough we do not need a generic message parameter. Instead, we can define the message inside the exception constructor.

    Next we can use an @ExceptionHandler method in a @ControllerAdvice bean to handle the actual exception:

    @ControllerAdvice
    public class ArticleExceptionHandler {
    
        @ExceptionHandler(ArticleNotFoundException.class)
        public ResponseEntity<ErrorResponse> onArticleNotFoundException(ArticleNotFoundException e) {
            String message = String.format("No article with id %s found", e.getArticleId());
            return ResponseEntity
                    .status(HttpStatus.NOT_FOUND)
                    .body(new ErrorResponse("ARTICLE_NOT_FOUND", message));
        }
        
        ...
    }

    If controller methods throw exceptions, Spring tries to find a method annotated with a matching @ExceptionHandler annotation. @ExceptionHandler methods can have flexible method signatures, similar to standard controller methods. For example, we can a HttpServletRequest request parameter and Spring will pass in the current request object. Possible parameters and return types are described in the Javadocs of @ExceptionHandler.

    In this example, we create a simple ErrorResponse object that consists of an error code and a message.

    The message is constructed based on the data carried by the exception. It is also possible to pass the exception message to the client. However, in this case we need to make sure everyone in the team is aware of this and exception messages do not contain sensitive information. Otherwise, we might accidentally leak internal information to the client.

    ErrorResponse is a simple Pojo used for JSON serialization:

    public class ErrorResponse {
        private final String code;
        private final String message;
    
        public ErrorResponse(String code, String message) {
            this.code = code;
            this.message = message;
        }
    
        // getter
    }

    Testing error responses

    A good test suite should not miss tests for specific error responses. In our example we can verify error behaviour in different ways. One way is to use a Spring MockMvc test.

    For example:

    @SpringBootTest
    @AutoConfigureMockMvc
    public class ArticleExceptionHandlerTest {
    
        @Autowired
        private MockMvc mvc;
    
        @MockBean
        private ArticleRepository articleRepository;
    
        @Test
        public void articleNotFound() throws Exception {
            when(articleRepository.findById(new ArticleId("123"))).thenReturn(Optional.empty());
    
            mvc.perform(post("/articles/123/publish"))
                    .andExpect(status().isNotFound())
                    .andExpect(jsonPath("$.code").value("ARTICLE_NOT_FOUND"))
                    .andExpect(jsonPath("$.message").value("No article with id 123 found"));
        }
    }


    Here, we use a mocked ArticleRepository that returns an empty Optional for the passed id. We then verify if the error code and message match the expected strings.

    In case you want to learn more about testing spring applications with mock mvc: I recently wrote an article showing how to improve Mock mvc tests.

    Summary

    Useful error message are an important part of an API.

    If errors are produced by the client (HTTP 4xx status codes) servers should provide a descriptive error response containing at least an error code and a human readable error message. Responses for unexpected server errors (HTTP 5xx) should be conservative to avoid accidental exposure any internal information.

    To provide useful error responses we can use specific exceptions that carry related data. Within @ExceptionHandler methods we then construct error messages based on the exception data.

  • Monday, 3 May, 2021

    Supporting bulk operations in REST APIs

    Bulk (or batch) operations are used to perform an action on more than one resource in single request. This can help reduce networking overhead. For network performance it is usually better to make fewer requests instead of more requests with less data.

    However, before adding support for bulk operations you should think twice if this feature is really needed. Often network performance is not what limits request throughput. You should also consider techniques like HTTP pipelining as alternative to improve performance.

    When implementing bulk operations we should differentiate between two different cases:

    • Bulk operations that group together many arbitrary operations in one request. For example: Delete product with id 42, create a user named John and retrieve all product-reviews created yesterday.
    • Bulk operations that perform one operation on different resources of the same type. For example: Delete the products with id 23, 45, 67 and 89.

    In the next section we will explore different solutions that can help us with both situations. Be aware that the shown solutions might not look very REST-like. Bulk operations in general are not very compatible with REST constraints as we operate on different resources with a single request. So there simply is no real REST solution.

    In the following examples we will always return a synchronous response. However, as bulk operations usually take longer to process it is likely you are also interested in an asynchronous processing style. In this case, my post about asynchronous operations with REST might also be interesting to you.

    Expressing multiple operations within the request body

    Probably a way that comes to mind quickly is to use a standard data format like JSON to define a list of desired operations.

    Let's start with a simple example request:

    POST /batch
    
    [
        {
            "path": "/products",
            "method": "post",
            "body": {
                "name": "Cool Gadget",
                "price": "$ 12.45 USD"
            }
        }, {
            "path": "/users/43",
            "method": "put",
            "body": {
                "name": "Paul"
            }
        },
        ...
    ]

    We use a generic /batch endpoint that accepts a simple JSON format to describe desired operations using URIs and HTTP methods. Here, we want to execute a POST request to /products and a PUT request to /users/43.

    A response body for the shown request might look like this:

    [
        {
            "path": "/products",
            "method": "post",
            "body": {
                "id": 123,
                "name": "Cool Gadget",
                "price": "$ 12.45 USD"
            },
            "status": 201
        }, {
            "path": "/users/43",
            "method": "put",
            "body": {
                "id": 43,
                "name": "Paul"
            },
            "status": 200
        },
        ...
    ]

    For each requested operation we get a result object containing the URI and HTTP method again. Additionally we get the status code and response body for each operation.

    This does not look too bad. In fact, APIs like this can be found in practice. Facebook for example uses a similiar approach to batch multiple Graph API requests.

    However, there are some things to consider with this approach:

    How are the desired operations executed on the server side? Maybe it is implemented as simple method call. It is also possible to create a real HTTP requests from the JSON data and then process those requests. In this case, it is important to think about request headers which might contain important information required by the processing endpoint (e.g. authentication tokens, etc.).

    Headers in general are missing in this example. However, headers might be important. For example, it is perfectly viable for a server to respond to a POST request with HTTP 201 and an empty body (see my post about resource creation). The URI of the newly created resource is usually transported using a Location header. Without access to this header the client might not know how to look up the newly created resource. So think about adding support for headers in your request format.

    In the example we assume that all requests and responses use JSON data as body which might not always be the case (think of file uploads for example). As alternative we can define the request body as string which gives us more flexibility. In this case, we need to escape JSON double quotes which can be awkward to read:

    An example request that includes headers and uses a string body might look like this:

    [
        {
            "path": "/users/43",
            "method": "put",
            "headers": [{ 
                "name": "Content-Type", 
                "value": "application/json"
            }],
            "body": "{ \"name\": \"Paul\" }"
        },
        ...
    ]

    Multipart Content-Type for the rescue?

    In the previous section we essentially translated HTTP requests and responses to JSON so we can group them together in a single request. However, we can do the same in a more standardized way with multipart content-types.

    A multipart Content-Type header indicates that the HTTP message body consists of multiple distinct body parts and each part can have its own Content-Type. We can use this to merge multiple HTTP requests into a single multipart request body.

    A quick note before we look at an example: My example snippets for HTTP requests and responses are usually simplified (unnecessary headers, HTTP versions, etc. might be skipped). However, in the next snippet we pack HTTP requests into the body of a multipart request requiring correct HTTP syntax. Therefore, the next snippets use the exact HTTP message syntax.

    Now let's look at an example multipart request containing two HTTP requests:

     1  POST http://api.my-cool-service.com/batch HTTP/1.1
     2  Content-Type: multipart/mixed; boundary=request_delimiter
     3  Content-Length: <total body length in bytes>
     4
     5  --request_delimiter
     6  Content-Type: application/http
     7  Content-ID: fa32d92f-87d9-4097-9aa3-e4aa7527c8a7
     8
     9  POST http://api.my-cool-service.com/products HTTP/1.1
    10  Content-Type: application/json
    11
    12  {
    13      "name": "Cool Gadget",
    14      "price": "$ 12.45 USD"
    15  }
    16  --request_delimiter
    17  Content-Type: application/http
    18  Content-ID: a0e98ffb-0b62-42a1-a321-54c6e9ef4c99
    19
    20  PUT http://api.my-cool-service.com/users/43 HTTP/1.1
    21  Content-Type: application/json
    22
    23  {
    24    "section": "Section 2"
    25  }
    26  --request_delimiter--

    Multipart content types require a boundary parameter. This parameter specifies the so-called encapsulation boundary which acts like a delimiter between different body parts.

    Quoting the RFC:

    The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter value from the Content-Type header field.

    In line 2 we set the Content-Type to multipart/mixed with a boundary parameter of request_delimiter. The blank line after the Content-Length header separates HTTP headers from the body. The following lines define the multipart request body.

    We start with the encapsulation boundary indicating the beginning of the first body part. Next follow the body part headers. Here, we set the Content-Type header of the body part to application/http which indicates that this body part contains a HTTP message. We also set a Content-Id header which we can be used to identify a specific body part. We use a client generated UUID for this.

    The next blank line (line 8) indicates that now the actual body part begins (in our case that's the embedded HTTP request). The first body part ends with the encapsulation boundary at line 16.

    After the encapsulation boundary, follows the next body part which uses the same format as the first one.

    Note that the encapsulation boundary following the last body part contains two additional hyphens at the end which indicates that no further body parts will follow.

    A response to this request might follow the same principle and look like this:

     1  HTTP/1.1 200
     2  Content-Type: multipart/mixed; boundary=response_delimiter
     3  Content-Length: <total body length in bytes>
     4
     5  --response_delimiter
     6  Content-Type: application/http
     7  Content-ID: fa32d92f-87d9-4097-9aa3-e4aa7527c8a7
     8
     9  HTTP/1.1 201 Created
    10  Content-Type: application/json
    11  Location: http://api.my-cool-service.com/products/123
    12
    13  {
    14      "id": 123,
    15      "name": "Cool Gadget",
    16      "price": "$ 12.45 USD"
    17  }
    18  --response_delimiter
    19  Content-Type: application/http
    20  Content-ID: a0e98ffb-0b62-42a1-a321-54c6e9ef4c99
    21  
    22  HTTP/1.1 200 OK
    23  Content-Type: application/json
    24
    25  {
    26      "id": 43,
    27      "name": "Paul"
    28  }
    29  --response_delimiter--

    This multipart response body contains two body parts both containing HTTP responses. Note that the first body part also contains a Location header which should be included when sending a HTTP 201 (Created) response status.

    Multipart messages seem like a nice way to merge multiple HTTP messages into a single message as it uses a standardized and generally understood technique.

    However, there is one big caveat here. Clients and the server need to be able to construct and process the actual HTTP messages in raw text format. Usually this functionality is hidden behind HTTP client libraries and server side frameworks and might not be easily accessible.

    Bulk operations on REST resources

    In the previous examples we used a generic /batch endpoint that can be used to modify many different types of resources in a single request. Now we will apply bulk operations on a specific set of resources to move a bit into a more rest-like style.

    Sometimes only a single operation needs to support bulk data. In such a case, we can simply create a new resource that accepts a collection of bulk entries.

    For example, assume we want to import a couple of products with a single request:

    POST /product-import
    
    [
        {
            "name": "Cool Gadget",
            "price": "$ 12.45 USD"
        },
        {
            "name": "Very cool Gadget",
            "price": "$ 19.99 USD"
        },
        ...
    ]

    A simple response body might look like this:

    [
        {
            "status": "imported",
            "id": 234235
            
        },
        {
            "status": "failed"
            "error": "Product name too long, max 15 characters allowed"
        },
        ...
    ]

    Again we return a collection containing details about every entry. As we provide a response to a specific operation (importing products) there is not need to use a generic response format. Instead, we can use a specific format that communicates the import status and potential import errors.

    Partially updating collections

    In a previous post we learned that PATCH can be used for partial modification of resources. PATCH can also use a separate format to describe the desired changes.

    Both sound useful for implementing bulk operations. By using PATCH on a resource collection (e.g. /products) we can partially modify the collection. We can use this to add new elements to the collection or update existing elements.

    For example we can use the following snippet to modify the /products collection:

    PATCH /products
    
    [
        {
            "action": "replace",
            "path": "/123",
            "value": {
                "name": "Yellow cap",
                "description": "It's a cap and it's yellow"
            }        
        },
        {
            "action": "delete",
            "path": "/124",
        },
        {
            "action": "create",
            "value": {
                "name": "Cool new product",
                "description": "It is very cool!"
            }
        }
    ]

    Here we perform three operations on the /products collection in a single request. We update resource /products/123 with new information, delete resource /products/124 and create a completely new product.

    A response might look somehow like this:

    [
        {
            "action": "replace",
            "path": "/123",
            "status": "success"
        }, 
        {
            "action": "delete",
            "path": "/124",
            "status": "success"
        }, {
            "action": "create",
            "status": "success"
        }
    ]

    Here we need to use a generic response entry format again as it needs to be compatible to all possible request actions.

    However, it would be too easy without a huge caveat: PATCH requires changes to be applied atomically.

    The RFC says:

    The server MUST apply the entire set of changes atomically and never provide [..] a partially modified representation. If the entire patch document cannot be successfully applied, then the server MUST NOT apply any of the changes.

    I usually would not recommend to implement bulk operation in an atomic way as this can increase complexity a lot.

    A simple workaround to be compatible with the HTTP specifications is to create a separate sub-resource and use POST instead of PATCH.

    For example:

    POST /products/batch 
    

    (same request body as the previous PATCH request)

    If you really want to go the atomic way, you might need to think about the response format again. In this case, it is not possible that some requested changes are applied while others are not. Instead you need to communicate what requested changes failed and which could have been applied if everything else would have worked.

    In this case, a response might look like this:

    [
        {
            "action": "replace",
            "path": "/123",
            "status": "rolled back"
        }, 
        {
            "action": "delete",
            "path": "/124",
            "status": "failed",
            "error": "resource not found"
        },
        ..
    ]

    Which HTTP status code is appropriate for responses to bulk requests?

    With bulk requests we have the problem than some parts of the request might execute successfully while other fail. If everything worked it is easy, in this case we can simply return HTTP 200 OK.

    Even if all requested changes fail it can be argued that HTTP 200 is still a valid response code as long as the bulk operation itself completed successfully.

    In either way the client needs to process the response body to get detailed information about the processing status.

    Another idea that might come in mind is HTTP 207 (Multi-status). HTTP 207 is part of RFC 4918 (HTTP extensions for WebDAV) and described like this:

    A Multi-Status response conveys information about multiple resources in situations where multiple status codes might be appropriate. [..] Although '207' is used as the overall response status code, the recipient needs to consult the contents of the multistatus response body for further information about the success or failure of the method execution. The response MAY be used in success, partial success and also in failure situations.

    So far this reads like a great fit.

    Unfortunately HTTP 207 is part of the Webdav specification and requires a specific response body format that looks like this:

    <?xml version="1.0" encoding="utf-8" ?>
    <d:multistatus xmlns:d="DAV:">
        <d:response>
            <d:href>http://www.example.com/container/resource3</d:href>
            <d:status>HTTP/1.1 423 Locked</d:status>
            <d:error><d:lock-token-submitted/></d:error>
        </d:response>
    </d:multistatus>

    This is likely not the response format you want. Some might argue that it is fine to reuse HTTP 207 with a custom response format. Personally I would not recommend doing this and instead use a simple HTTP 200 status code.

    In case you the bulk request is processed asynchronously HTTP 202 (Accepted) is the status code to use.

    Summary

    We looked at different approaches of building bulk APIs. All approaches have different up- and downsides. There is no single correct way as it always depends on your requirements.

    If you need a generic way to submit multiple actions in a single request you can use a custom JSON format. Alternatively you can use a multipart content-type to merge multiple requests into a single request.

    You can also come up with separate resources that that express the desired operation. This is usually the simplest and most pragmatic way if you only have one or a few operations that need to support bulk operations.

    In all scenarios you should evaluate if bulk operations really produce the desired performance gains. Otherwise, the additional complexity of bulk operations is usually not worth the effort.

     

    Interested in more REST related articles? Have a look at my REST API design page.