mscharhag, Programming and Stuff;

A blog about programming and software development topics, mostly focused on Java technologies including Java EE, Spring and Grails.

Monday, 21 September, 2020

Command-line JSON processing with jq

In this post we will learn how to parse, pretty-print and process JSON from the command-line with jq. At the end we will even use jq to do a simple JSON to CSV conversion. jq describes itself as a lightweight and flexible command-line JSON processor. You can think of unix shell tools like sed, grep and awk but for JSON.

jq works on various platforms. Prebuild binaries are available for Linux, Windows and Mac OS. See the jq download site for instructions.

For many of the following examples we will use a file named artist.json with the following JSON content:

{
    "name": "Leonardo da Vinci",
    "artworks": [{
            "name": "Mona Lisa",
            "type": "Painting"
        }, {
            "name": "The Last Supper",
            "type": "Fresco"
        }
    ]
}

Pretty-printing JSON and basic jq usage

jq is typically invoked by piping a piece of JSON to its standard input. For example:

echo '{ "foo" : "bar" }' | jq
{
  "foo": "bar"
}

Without any arguments jq simply outputs the JSON input data. Note that the output data has been reformatted. jq outputs pretty-printed JSON by default. This lets us pipe minimized JSON to jq and get a nicely formatted output.

jq accepts one or more filter(s) as parameter. The simplest filter is . which returns the whole JSON document. So this example produces the same output as the previous example:

echo '{ "foo" : "bar" }' | jq '.'

We can now add a simple object identifier to the filter. For this we will use the previously mentioned artist.json file. With .name we select the value of the name element:

cat artist.json | jq '.name'
"Leonardo da Vinci"

Arrays can be navigated using the [] syntax:

cat artist.json | jq '.artworks[0]'
{
  "name": "Mona Lisa",
  "type": "Painting"
}

To get the name of the first painting we use:

cat artist.json | jq '.artworks[0].name'
"Mona Lisa"

If we want to get the names of all artworks we simply skip the array index parameter:

cat artist.json | jq '.artworks[].name'
"Mona Lisa"
"The Last Supper"

Processing curl and wget responses

Of course we can also pipe responses from remote systems to jq. This is not a specific feature of jq, but because this is a common use-case we look into two short examples. For these examples we will use the public GitHub API to get information about my blog-examples repository.

With curl this is very simple. This extracts the name and full_name properties from the GitHub API response:

curl https://api.github.com/repos/mscharhag/blog-examples | jq '.name,.full_name'
"blog-examples"
"mscharhag/blog-examples"

Note we used a comma here to separate different two different filters.

With wget we need to add a few parameters to get the output in the right format:

wget -cq https://api.github.com/repos/mscharhag/blog-examples -O - | jq '.owner.html_url'
"https://github.com/mscharhag"

Pipes, functions and operators

In this section we will into a more ways of filtering JSON data.

With the | operator we can combine two filters. It works similar as the standard unix shell pipe. The output of the filter on the left is passed to the one on the right.

Note that .foo.bar is the same as .foo | .bar (the JSON element .foo is passed to the second filter which then selects .bar).

Pipes can be combined with functions. For example we can use the keys function to get the keys of an JSON object:

cat artist.json | jq '. | keys'
[
  "artworks",
  "name"
]

With the length function we can get the number of elements in an array:

cat artist.json | jq '.artworks | length'
2

The output of the length function depends on the input element:

  • If a string is passed, then it returns the number of characters
  • For arrays the number of elements is returned
  • For objects the number of key-value pairs is returned

We can combine the length function with comparison operators:

cat artist.json | jq '.artworks | length < 5'
true

Assume we want only the artworks whose type is Painting. We can accomplish this using the select function:

cat artist.json | jq '.artworks[] | select(.type == "Painting")'
{
  "name": "Mona Lisa",
  "type": "Painting"
}

select accepts an expression and returns only those inputs that match the expression.

Transforming JSON documents

In this section we will transform the input JSON document into a completely different format.

We start with this:

cat artist.json | jq '{(.name): "foo"}'
{
  "Leonardo da Vinci": "foo"
}

Here we create a new JSON object which uses the .name element as key. To use an expression as an object key we need to add parentheses around the key (this does not apply to values as we will see with the next example)

Now let's add the list of artworks as value:   

cat artist.json | jq '{(.name): .artworks}'
{
  "Leonardo da Vinci": [
    {
      "name": "Mona Lisa",
      "type": "Painting"
    },
    {
      "name": "The Last Supper",
      "type": "Fresco"
    }
  ]
}

Next we apply the map function to the artworks array:

cat artist.json | jq '{(.name): (.artworks | map(.name) )}'
{
  "Leonardo da Vinci": [
    "Mona Lisa",
    "The Last Supper"
  ]
}

map allows us to modify each array element with an expression. Here, we simply select the name value of each array element.

Using the join function we can join the array elements into a single string:

cat artist.json | jq '{(.name): (.artworks | map(.name) | join(", "))}'
{
  "Leonardo da Vinci": "Mona Lisa, The Last Supper"
}

The resulting JSON document now contains only the artist and a comma-separated list of his artworks.

Converting JSON to CSV

We can also use jq to perform simple JSON to CSV transformation. As example we will transform the artworks array of our artist.json file to CSV.

We start with adding the .artworks[] filter:

cat artist.json | jq '.artworks[]'
{
  "name": "Mona Lisa",
  "type": "Painting"
}
{
  "name": "The Last Supper",
  "type": "Fresco"
}

This deconstructs the artworks array into separate JSON objects.

Note: If we would use .artworks (without []) we would get an array containing both elements. By adding [] we get two separate JSON objects we can now process individually.

Next we convert these JSON objects to arrays. For this we pipe the JSON objects into a new filter:

cat artist.json | jq '.artworks[] | [.name, .type]'
[
  "Mona Lisa",
  "Painting"
]
[
  "The Last Supper",
  "Fresco"
]

The new filter returns an JSON array containing two elements (selected by .name and .type)

Now we can apply the @csv operator which formats a JSON array as CSV row:

cat artist.json | jq '.artworks[] | [.name, .type] | @csv'
"\"Mona Lisa\",\"Painting\""
"\"The Last Supper\",\"Fresco\""

jq applies JSON encoding to its output by default. Therefore, we now see two CSV rows with JSON escaping, which is not that useful.

To get the raw CSV output we need to add the -r parameter:

cat artist.json | jq -r '.artworks[] | [.name, .type] | @csv'
"Mona Lisa","Painting"
"The Last Supper","Fresco"

Summary

jq is a powerful tool for command-line JSON processing. Simple tasks like pretty-printing or extracting a specific value from a JSON document are quickly done in a shell with jq. Furthermore the powerful filter syntax combined with pipes, functions and operators allows us to do more complex operations. We can transform input documents to completely different output documents and even convert JSON to CSV.

If you want to learn more about jq you should look at its excellent documentation.

 

Leave a reply