1
votes

I'm new to the whole Hadoop/Hortonworks/Pig stuff, so excuse me for the question.

I have installed the Hortonworks Sandbox. I'm trying to load a twitter JSON file and perform some queries on the file, but I'm currently stuck in the loading file part.

I know that I should use the Elephant-bird in order to load a JSON file (without specifying the JSON schema) with JsonLoader(), so I've downloaded the Elephant-bird from the git repo and I've included the jar file

Elephant-bird\repo\com\twitter\elephant-bird\2.2.3\elephant-bird-2.2.3.jar

inside the Hortonworks Sandbox. Here a screen shot with my Pig script:

REGISTER elephant-bird-2.2.3.jar;
Json1 = LOAD 'JSON/sample.tweets' JsonLoader();
DESCRIBE Json1;
STORE Json1 INTO 'tweeterOutput';

Unfortunately I cannot get any results from this script execution. I've tried with both STORE and DUMP commands.

Probably I'm doing many wrong things in this process flow, so any help will be appreciated!

2

2 Answers

1
votes

You are missing the USING keyword:

Json1 = LOAD 'JSON/sample.tweets' USING JsonLoader();
0
votes

Fix the below

  1. You need to add few more jars: elephant-bird-core-4.4.jar, elephant-bird-pig-4.4.jar, elephant-bird-hadoop-compat-4.4.jar, json-simple-1.1.1.jar
  2. Register all of them in the script

    REGISTER elephant-bird-core-4.4.jar;

    REGISTER elephant-bird-pig-4.4.jar;

    REGISTER elephant-bird-hadoop-compat-4.4.jar;

    REGISTER json-simple-1.1.1.jar;

  3. LOAD 'JSON/sample.tweets' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');