0
votes

I have a large chunk of json which contains about 10 unique elements. Each of these elements contains an ID, a few other attributes, and a links attribute (some of which also have IDs). Is there a way that I can get only the top level ID in each element of the json using bash (and preferably no external libraries)?

Here is an example:

{
"page": {
    "size": 10,
    "number": 1,
    "totalPages": 1,
    "totalElements": 10,
    "resultSetId": "TODO",
    "duration": 999
},
"content": [
    {
        "id": "fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07",
        "name": "volume 0",
        "userTags": [],
        "links": [
            {
                "rel": "whatever",
                "href": "/whatever/67b46e10-21ed-4394-b706-9eb61d75933e",
                "id": "67b46e10-21ed-4394-b706-9eb61d75933e"
            },
            {
                "rel": "whatever_else",
                "href": "/whatever_else/fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07/workflowList"
            },
            {
                "rel": "stuff",
                "href": "/stuff/fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07/planList"
            },
            {
                "rel": "self",
                "href": "/self/fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07",
                "id": "fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07"
            },
            {
                "rel": "container",
                "href": "/container/575a0c38-c60a-4d52-ba38-cb20f4b6d9e7",
                "id": "575a0c38-c60a-4d52-ba38-cb20f4b6d9e7"
            },
            {
                "rel": "parent",
                "href": "/parent/85b7f0e7-b946-4bc4-9ca6-582a5ca08c51",
                "id": "85b7f0e7-b946-4bc4-9ca6-582a5ca08c51"
            }
        ],
        "discovered": false,
        "lastUpdated": "2015-11-20T09:33:05.757-0800",
        "nativeUri": null,
        "vendor": null,
        "suspended": [],
        "enabled": [],
    },
    {
        "id": "4292014f-01cd-4369-9cc0-7bf41a8be53d",
        "name": "Storage_Group_001",
        "attributes": {},
        "userTags": [],
        "links": [
            {
                "rel": "stuff",
                "href": "/stuff/67b46e10-21ed-4394-b706-9eb61d75933e",
                "id": "67b46e10-21ed-4394-b706-9eb61d75933e"
            },
            {
                "rel": "something",
                "href": "/something/4292014f-01cd-4369-9cc0-7bf41a8be53d/workflowList"
            },
            {
                "rel": "whatever",
                "href": "/whatever/4292014f-01cd-4369-9cc0-7bf41a8be53d/planList"
            },
            {
                "rel": "self",
                "href": "/self/4292014f-01cd-4369-9cc0-7bf41a8be53d",
                "id": "4292014f-01cd-4369-9cc0-7bf41a8be53d"
            },
            {
                "rel": "container",
                "href": "/stuff/575a0c38-c60a-4d52-ba38-cb20f4b6d9e7",
                "id": "575a0c38-c60a-4d52-ba38-cb20f4b6d9e7"
            }
        ],
        "lastUpdated": "2015-11-18T06:37:56.739-0800",
        "nativeUri": null,
        "vendor": null,
        "suspended": [],
        "enabled": [],
    },
    {
        "id": "896aca64-17a6-4acb-a93c-562424dc1bc4",
        "name": "volume 4",
        "attributes": {},
...

So basically, I just want to get the top id for each section, but none of the ids in the links sections. I got close using awk, and also with perl, but it is impossible to predict the exact number of ids contained in the links section. Here was my awk attempt (Which assumed there were exactly 5 entries between desired ids. I also just dumped the json into a temp file so I didn't have to curl every time):

awk '{if (count++%5==0) print $0;}' <(cat tmp.txt | grep -Po '(?<="id":")[^"]*')
2
awk is external to bash, but it's still the wrong tool. Use something like jq. - chepner
Is it possible to do without jq? I'm using openSuse and jq is not installed by default, so I don't want to add a dependency for everyone that will be using the script in the future. - user3270760
Not reliably; you should use a JSON parser to parse JSON. - chepner

2 Answers

1
votes

With jq:

jq '.content[] | .id' some.json
1
votes

Here is an awk-only "solution" (solution is a bit optimistic as awk is no json-parser):

awk '$0 ~ /{/ {count++} 
     $0 ~ /}/ {count--} 
     $0 ~ "\"id\":"&& count==2 {print $0}' inputFile

We count the number of opening and closing curl brackets.
Finally, we print all lines that contain "id" and print it. The output for your example:

"id": "fbc67d7a-50a3-4c1c-9a75-4db0ba5dcb07",
"id": "4292014f-01cd-4369-9cc0-7bf41a8be53d",
"id": "896aca64-17a6-4acb-a93c-562424dc1bc4",

This solution assumes that there is at most one bracket of each type ({ or }) per line.

Alternatively, you might have a look at jsawk which is like awk, but for JSON. (If you can chmod the file, it is probably the better option.)