per Split/Slice large JSON using jq we are able to successfully slice huge input file into smaller chunk of data based on array size..
Would like to add a new json element to it with incrementing sequence number based on length of original array along with filter/unique per few columns.
Input:
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}
Expected Output: After performing adding of additional key
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":2,"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}
After performing filter (by State, City, Postal) and slice per array size of 2
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":1,"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"rownum":3,"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"}]}
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"rownum":4,"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77832"}
]
}
Below sample was used to filer/unique by few columns, not attaining optimal performance
input.json jq -r --argjson size 2 ' .add |= unique_by({city,state,postal}) | del(.add) as $object | (.add|_nwise($size) | ("\t", $object + {add:.} )) ' | awk ' /^\t/ {fn++; next} { print >> "part-" fn ".json"}'