I have an array of JSON objects like such. Each array encapsulated by [ and ] are on a single line.
[{"event":0,"properties":{"color":"red","connectionType":2}}{"event":30,"properties":{"color":"blue","connectionType":4}},{"event":45,"properties":{"color":"green","connectionType":3}}]
[{"event":0,"properties":{"color":"red","connectionType":5}},
{"event":1,"properties":{"color",:"blue","connectionType":6}}]
Here it is in an easier to read format.
[
{"event":0, "properties":{"color":"red","connectionType":2}},
{"event":3, "properties":{"color":"blue",'connectionType":4}},
{"event":45, "properties":{"color":"green","connectionType":3}}
]
[
{"event":0, "properties":{"color":"red","connectionType":5}},
{"event":1, "properties":{"color":"blue","connectionType":6}}
]
Some things to note, so each JSON object inside an [ ] are in a single line. The number of objects in each line varies. The number of fields inside properties also varies.
What I want with this data, is to take each JSON object and convert it to tabular format in the form of comma separated or tab separated values
| event | color | connectionType
0 red 2
3 blue 4
I've looked at a few tools that are used by PIG to read JSON structures - namely elephant-bird, but can't quite get it to work on my data.
I'm hoping to get pointers on alternative solutions, or example code using elephant-bird / other pig json parsers. My end goal is really to just capture a subset of events and properties and load them into Hive.