Get only top level ids from json using bash

Question

I have a large chunk of json which contains about 10 unique elements. Each of these elements contains an ID, a few other attributes, and a links attribute (some of which also have IDs). Is there a way that I can get only the top level ID in each element of the json using bash (and preferably no external libraries)?

Here is an example:

{
"page": {
    "size": 10,
    "number": 1,
    "totalPages": 1,
    "totalElements": 10,
    "resultSetId": "TODO",
    "duration": 999
},
"content": [
    {
        "id": "fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07",
        "name": "volume 0",
        "userTags": [],
        "links": [
            {
                "rel": "whatever",
                "href": "/whatever/67b46e10-21ed-4394-b706-9eb61d75933e",
                "id": "67b46e10-21ed-4394-b706-9eb61d75933e"
            },
            {
                "rel": "whatever_else",
                "href": "/whatever_else/fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07/workflowList"
            },
            {
                "rel": "stuff",
                "href": "/stuff/fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07/planList"
            },
            {
                "rel": "self",
                "href": "/self/fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07",
                "id": "fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07"
            },
            {
                "rel": "container",
                "href": "/container/575a0c38-c60a-4d52-ba38-cb20f4b6d9e7",
                "id": "575a0c38-c60a-4d52-ba38-cb20f4b6d9e7"
            },
            {
                "rel": "parent",
                "href": "/parent/85b7f0e7-b946-4bc4-9ca6-582a5ca08c51",
                "id": "85b7f0e7-b946-4bc4-9ca6-582a5ca08c51"
            }
        ],
        "discovered": false,
        "lastUpdated": "2015-11-20T09:33:05.757-0800",
        "nativeUri": null,
        "vendor": null,
        "suspended": [],
        "enabled": [],
    },
    {
        "id": "4292014f-01cd-4369-9cc0-7bf41a8be53d",
        "name": "Storage_Group_001",
        "attributes": {},
        "userTags": [],
        "links": [
            {
                "rel": "stuff",
                "href": "/stuff/67b46e10-21ed-4394-b706-9eb61d75933e",
                "id": "67b46e10-21ed-4394-b706-9eb61d75933e"
            },
            {
                "rel": "something",
                "href": "/something/4292014f-01cd-4369-9cc0-7bf41a8be53d/workflowList"
            },
            {
                "rel": "whatever",
                "href": "/whatever/4292014f-01cd-4369-9cc0-7bf41a8be53d/planList"
            },
            {
                "rel": "self",
                "href": "/self/4292014f-01cd-4369-9cc0-7bf41a8be53d",
                "id": "4292014f-01cd-4369-9cc0-7bf41a8be53d"
            },
            {
                "rel": "container",
                "href": "/stuff/575a0c38-c60a-4d52-ba38-cb20f4b6d9e7",
                "id": "575a0c38-c60a-4d52-ba38-cb20f4b6d9e7"
            }
        ],
        "lastUpdated": "2015-11-18T06:37:56.739-0800",
        "nativeUri": null,
        "vendor": null,
        "suspended": [],
        "enabled": [],
    },
    {
        "id": "896aca64-17a6-4acb-a93c-562424dc1bc4",
        "name": "volume 4",
        "attributes": {},
...

So basically, I just want to get the top id for each section, but none of the ids in the links sections. I got close using awk, and also with perl, but it is impossible to predict the exact number of ids contained in the links section. Here was my awk attempt (Which assumed there were exactly 5 entries between desired ids. I also just dumped the json into a temp file so I didn't have to curl every time):

awk '{if (count++%5==0) print $0;}' <(cat tmp.txt | grep -Po '(?<="id":")[^"]*')

awk is external to bash, but it's still the wrong tool. Use something like jq. — chepner
Is it possible to do without jq? I'm using openSuse and jq is not installed by default, so I don't want to add a dependency for everyone that will be using the script in the future. — user3270760

chepner chepner · Accepted Answer · 2015-11-24T19:34:43

1

votes

With jq:

jq '.content[] | .id' some.json

Get only top level ids from json using bash

2 Answers