0
votes

I am new to pig and am trying to parse a json with the following structure

{"id1":197,"id2":[ 
    {"id3":"109.11.11.0","id4":"","id5":1391233948301},
    {"id3":"10.10.15.81","id4":"","id5":1313393100648},
    ...
]}

The above file is called jsonfile.txt

alias = load 'jsonfile.txt' using JsonLoader('id1:int,id2:[id3:chararray,id4:chararray,id5:chararray]');

This is the error I get.

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input 'id3' expecting RIGHT_BRACKET

Do you know what i could be doing wrong?

1
Try checking whole JSON here. Maybe, it is just last trailing comma.kirilloid
I just checked the json has a correct format.user1386101

1 Answers

1
votes

Your JSON schema is not well formatted.

The formats for complex data types are shown here:

Tuple: enclosed by (), items separated by ","
    Non-empty tuple: (item1,item2,item3)
    Empty tuple is valid: ()
Bag: enclosed by {}, tuples separated by ","
    Non-empty bag: {code}{(tuple1),(tuple2),(tuple3)}{code}
    Empty bag is valid: {}
Map: enclosed by [], items separated by ",", key and value separated by "#"
    Non-empty map: [key1#value1,key2#value2]
    Empty map is valid: []

Source : http://pig.apache.org/docs/r0.10.0/func.html#jsonloadstore

In other words, [] aren't arrays, they're associative tables (maps) where the key character is "#" to split keys and values. Try using tuples (parenthesis) instead.

'id1:int,id2:(id3:chararray,id4:chararray,id5:chararray)'

OR

'id1:int,id2:{(id3:chararray,id4:chararray,id5:chararray)}'

I couldn't test it and never trying Pig but according to documentation, it should work just fine.

(based on the following example)

a = load 'a.json' using JsonLoader('a0:int,a1:{(a10:int,a11:chararray)},a2:(a20:double,a21:bytearray),a3:[chararray]');