mscharhag, Programming and Stuff;

A blog about programming and software development topics, mostly focused on Java technologies including Java EE, Spring and Grails.

Thursday, 8 April, 2021

Looking into the JDK 16 vector API

JDK 16 comes with the incubator module jdk.incubator.vector (JEP 338) which provides a portable API for expressing vector computations. In this post we will have a quick look at this new API.

Note that the API is in incubator status and likely to change in future releases.

Why vector operations?

When supported by the underlying hardware vector operations can increase the number of computations performed in a single CPU cycle.

Assume we want to add two vectors each containing a sequence of four integer values. Vector hardware allows us to perform this operation (four integer additions in total) in a single CPU cycle. Ordinary additions would only perform one integer addition in the same time.

The new vector API allows us to define vector operations in a platform agnostic way. These operations then compile to vector hardware instructions at runtime.

Note that HotSpot already supports auto-vectorization which can transform scalar operations into vector hardware instructions. However, this approach is quite limited and utilizes only a small set of available vector hardware instructions.

A few example domains that might benefit from the new vector API are machine learning, linear algebra or cryptography.

Enabling the vector incubator module (jdk.incubator.vector)

To use the new vector API we need to use JDK 16 (or newer). We also need to add the jdk.incubator.vector module to our project. This can be done with a file:

module com.mscharhag.vectorapi {
    requires jdk.incubator.vector;

Implementing a simple vector operation

Let's start with a simple example:

float[] a = new float[] {1f, 2f, 3f, 4f};
float[] b = new float[] {5f, 8f, 10f, 12f};

FloatVector first = FloatVector.fromArray(FloatVector.SPECIES_128, a, 0);
FloatVector second = FloatVector.fromArray(FloatVector.SPECIES_128, b, 0);

FloatVector result = first

We start with two float arrays (a and b) each containing four elements. These provide the input data for our vectors.

Next we create two FloatVectors using the static fromArray(..) factory method. The first parameter defines the size of the vector in bits (here 128). Using the last parameter we are able to define an offset value for the passed arrays (here we use 0)

In Java a float value has a size of four bytes (= 32 bits). So, four float values match exactly the size of our vector (128 bits).

After that, we can define our vector operations. In this example we add both vectors together, then we square and negate the result.

The resulting vector contains the values:

[-36.0, -100.0, -169.0, -256.0]

We can write the resulting vector into an array using the intoArray(..) method:

float[] resultArray = new float[4];
result.intoArray(resultArray, 0);

In this example we use FloatVector to define operations on float values. Of course we can use other numeric types too. Vector classes are available for byte, short, integer, float and double (ByteVector, ShortVector, etc.).

Working with loops

While the previous example was simple to understand it does not show a typical use case of the new vector API. To gain any benefits from vector operations we usually need to process larger amounts of data.

In the following example we start with three arrays a, b and c, each having 10000 elements. We want to add the values of a and b and store it in c: c[i] = a[i] + b[i].

Our code looks like this:

final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_128;

float[] a = randomFloatArray(10_000);
float[] b = randomFloatArray(10_000);
float[] c = new float[10_000];

for (int i = 0; i < a.length; i += SPECIES.length()) {
    VectorMask<Float> mask = SPECIES.indexInRange(i, a.length);
    FloatVector first = FloatVector.fromArray(SPECIES, a, i, mask);
    FloatVector second = FloatVector.fromArray(SPECIES, b, i, mask);
    first.add(second).intoArray(c, i, mask);

Here we iterate over the input arrays in strides of vector length. A VectorMask helps us if vectors cannot be completely filled from input data (e.g. during the last loop iteration).


We can use the new vector API to define vector operations for optimizing computations for vector hardware. This way we can increase the number of computations performed in a single CPU cycle. Central element of the vector API are type specific vector classes like FloatVector or LongVector.

You can find the example source code on GitHub.

Leave a reply