Load JSON file in Pig script inside Hortonworks Sandbox

Question

I'm new to the whole Hadoop/Hortonworks/Pig stuff, so excuse me for the question.

I have installed the Hortonworks Sandbox. I'm trying to load a twitter JSON file and perform some queries on the file, but I'm currently stuck in the loading file part.

I know that I should use the Elephant-bird in order to load a JSON file (without specifying the JSON schema) with JsonLoader(), so I've downloaded the Elephant-bird from the git repo and I've included the jar file

Elephant-bird\repo\com\twitter\elephant-bird\2.2.3\elephant-bird-2.2.3.jar

inside the Hortonworks Sandbox. Here a screen shot with my Pig script:

REGISTER elephant-bird-2.2.3.jar;
Json1 = LOAD 'JSON/sample.tweets' JsonLoader();
DESCRIBE Json1;
STORE Json1 INTO 'tweeterOutput';

Unfortunately I cannot get any results from this script execution. I've tried with both STORE and DUMP commands.

Probably I'm doing many wrong things in this process flow, so any help will be appreciated!

Donald Miner Donald Miner · Accepted Answer · 2013-11-05T21:10:16

1

votes

You are missing the USING keyword:

Json1 = LOAD 'JSON/sample.tweets' USING JsonLoader();

Load JSON file in Pig script inside Hortonworks Sandbox

2 Answers