0
votes

The Json data that I have is:

{"time": "2015-06-30T23:00:00Z",
    "type": "analysis",
    "revision": "0.8",
    "hostname": "iem6.local",
    "data": [
    {"gid": 1, "tmpc": 28.00, "wawa": [""], "ptype": 10, "dwpc": 17.40, "smps": 6.2, "drct": 99, "vsby": 16.093, "roadtmpc": 39.10,"srad": 77.61, "snwd": 0.00, "pcpn": 0.00},
{"gid": 213840, "tmpc": 22.00, "wawa": [""], "ptype": 10, "dwpc": 13.70, "smps": 5.7, "drct": 350, "vsby": 16.093, "roadtmpc": 32.70,"srad": 249.50, "snwd": 0.00, "pcpn": 0.00}]}

I am trying to load data using Json Loader of Apache Pig.

data_raw = LOAD '205006.json' using JsonLoader('time:chararray,type:chararray,revision:chararray,hostname:chararray,data:(gid:int,tmpc:float,wawa:{(a:chararray)},ptype:int,dwpc:float)');

However, the output that is given when I dump the result is incorrect.

(2015-06-30T23:00:00Z,,,,)
(,,,,)
(,,,,)
(,,,,)
(,,,,)
(1,28.00,[,],)
(2,28.00,[,],)

The warning thrown is

2016-10-24 15:43:55,852 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, returning null for {"time": "2015-06-30T23:00:00Z",
2016-10-24 15:43:55,871 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record     "type": "analysis",
2016-10-24 15:43:55,872 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record     "revision": "0.8",
2016-10-24 15:43:55,872 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record     "hostname": "iem6.local",
2016-10-24 15:43:55,872 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record     "data": [
2016-10-24 15:43:55,872 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad tuple field, could not find start of object, field 4
2016-10-24 15:43:55,873 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find end of record     {"gid": 1, "tmpc": 28.00, "wawa": [""], "ptype": 10, "dwpc": 17.40, "smps": 6.2, "drct": 99, "vsby": 16.093, "roadtmpc": 39.10,"srad": 77.61, "snwd": 0.00, "pcpn": 0.00},
2016-10-24 15:43:55,873 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad tuple field, could not find start of object, field 4

I can't use Elephant bird for this.

1
can you please post your full json. may be your json is not valid (meaning you are missing curly brackets or square brackets). you can check your json validity using jsonlint.com - Rijul
I edited the sample JSON data and have put the first and last data point as sample - Pranamesh
is it multiline or single line json? - Rahul Sharma

1 Answers

0
votes

First of all, you should join your json into same line. Keep in mind of that there is one json object per line.

Second of all, use pig command of below:

data_raw = LOAD '205006.json' using JsonLoader('time:chararray,type:chararray,revision:chararray,hostname:chararray,data:{(gid:int,tmpc:float,wawa:{(chararray)},ptype:int, dwpc:float, smps:float, drct:int, vsby:float, roadtmpc:float, srad: float, snwd:float, pcpn:float)}');

You should describe all fields in json string by order.