mscharhag, Programming and Stuff;

A blog about programming and software development topics, mostly focused on Java technologies including Java EE, Spring and Grails.

Wednesday, 2 September, 2020

REST: Dealing with Pagination

In a previous post we learned how to retrieve resource collections. When those collections become larger, it is often useful to provide a way for clients to retrieve partial collections.

Assume we provide an REST API for painting data. Our database might contain thousands of paintings. However, a web interface showing these paintings to users might only be able to show ten paintings at the same time. To view the next paintings the user needs to navigate to the next page which shows the following ten paintings. This process of dividing the content into smaller consumable sections (pages) is called Pagination.

Pagination can be an essential part of your API if you are dealing with large collections.

In the following sections we will look at different types of pagination

Using page and size parameters

The page parameter tells which page should be returned while size indicates how many elements a page should contain.

For example, this might return the first page, containing 10 painting resources.

GET /paintings?page=1&size=10

To get the next page we simply increase the page parameter by one.

Unfortunately it is not always clear if pages start counting with 0 or 1, so make sure to document this properly.

(In my opinion 1 should be preferred because this represents the natural page counting)

A minor issue with this approach might be that the client cannot change the size parameter for a specific page.

For example, after getting the first 10 items of a collection by issuing

GET /paintings?page=1&size=10

we cannot get the second page with a size of 15 by requesting:

GET /paintings?page=2&size=15

This will return the items 15-30 of the collection. So, we missed 5 items (10-14).

Using offset and limit parameters

Another, but very similar approach is the use of offset and limit parameters. offset tells the server the number of items that should be skipped, while limit indicates the number of items to be returned.

For example, this might return the first 10 painting resources:

GET /paintings?offset=0&limit=10

An offset parameter of 0 means that no elements should be skipped.

We can get the following 10 resources by skipping the first 10 resources (= setting the offset to 10):

GET /paintings?offset=10&limit=10

This approach is a bit more flexible because offset and limit do not effect each other. So we can increase the limit for a specific page. We just need to make sure to adjust the offset parameter for the next page request accordingly.

For example, this can be useful if a client displays data using a infinite scrollable list. If the user scrolls faster the client might request a larger chunk of resources with the next request.

The downsides?

Both previous solutions can work fine. They are often very easy to implement. However, both share two downsides.

Depending on the underlying database and data structure you might run into performance problems for large offsets / page numbers. This is often an issue for relational databases (see this Stackoverflow questions for MySQL or this one for Postgres).

Another problem is resource skipping caused by delete operations. Assume we request the first page by issuing:

GET /paintings?page=1&size=10

After we retrieved the response, someone deletes a resource that is located on the first page. Now we request the second page with:

GET /paintings?page=2&size=10

We now skipped one resource. Due to the deletion of a resource on the first page, all other resources in the collection move one position forward. The first resource of page two has moved to page one. 

Seek Pagination

An approach to solve those downsides is called Seek Pagination. Here, we use resource identifiers to indicate the collection offset.

For example, this might return the first five resources:

GET /paintings?limit=5

Response:

[
    { "id" : 2, ... },
    { "id" : 3, ... },
    { "id" : 5, ... },
    { "id" : 8, ... },
    { "id" : 9, ... }
]

To get the next five resources, we pass the id of the last resource we received:

GET /paintings?last_id=9&limit=5

Response:

[
    { "id" : 10, ... },
    { "id" : 11, ... },
    { "id" : 13, ... },
    { "id" : 14, ... },
    { "id" : 17, ... }
]

This way we can make sure we do not accidentally skip a resource.

For a relational database this is now much simpler. It is very likely that we just have to compare the primary key to the last_id parameter. The resulting query probably looks similar to this:

select * from painting where id > last_id order by id limit 5;

Response format

When using JSON, partial results should be returned as JSON object (instead of an JSON array). Beside the collection items the total number of items should be included.

Example response:

{
    "total": 4321,
    "items": [
        {
            "id": 1,
            "name": "Mona Lisa",
            "artist": "Leonardo da Vinci"
        }, {
            "id": 2
            "name": "The Starry Night",
            "artist": "Vincent van Gogh"
        }
    ]
}

When using page and size parameters it is also a good idea to return the total number of available pages.

Hypermedia controls

If you are using Hypermedia controls in your API you should also add links for first, last, next and previous pages. This helps decoupling the client from your pagination logic.

For example:

GET /paintings?offset=0&limit=10
{
    "total": 4317,
    "items": [
        {
            "id": 1,
            "name": "Mona Lisa",
            "artist": "Leonardo da Vinci"
        }, {
            "id": 2
            "name": "The Starry Night",
            "artist": "Vincent van Gogh"
        },
        ...
    ],
    "links": [
        { "rel": "self", "href": "/paintings?offset=0&limit=10" },
        { "rel": "next", "href": "/paintings?offset=10&limit=10" },
        { "rel": "last", "href": "/paintings?offset=4310&limit=10" },
        { "rel": "by-offset", "href": "/paintings?offset={offset}&limit=10" }
    ]
}

Note that we requested the first page. Therefore the first and previous links are missing. The by-offset link uses an URI-Template, so the client choose an arbitrary offset.

Range headers and HTTP status 206 (partial content)

So far we passed pagination options as request parameters. However, we can also follow an alternative approach using Range and Content-Range headers.

In the next example request the client uses the Range-header to request the first 10 paintings:

GET /paintings
Range: items=0-10

The Range header is used to request only specific parts of the resource and requires the following format:

Range: <unit>=<range-start>-<range-end>

With:

  • <unit> - The unit in which the range is specified. Often bytes is used. However, for APIs we can also use something like items.
  • <range-start> - Start of the requested range
  • <range-end> - End of the requested range

The server responds to this request with HTTP status 206 (Partial Content) which requires a Content-Range header:

HTTP/1.1 206 Partial Content
Content-Range: items 0-12/34

[
	... first 10 items
]

Within the Content-Range header the server communicates the offsets of the returned items and the total amount of items. The required format of Content-Range looks like this:

Content-Range: <unit> <range-start>-<range-end>/<size>

With:

  • <unit> - The unit of the returned range
  • <range-start> - Beginning of the range
  • <range-end> - End of the range
  • <size> - Total size of the range. Can be * if the size is not known

While this approach can work fine, it is usually easier to work with query parameters than parsing Range and Content-Range headers. It is also not possible to provide hypermedia pagination links if we communicate pagination offsets within headers.

 

Interested in more REST related articles? Have a look at my REST API design page.

Leave a reply