I'm new to the whole Hadoop/Hortonworks/Pig stuff, so excuse me for the question.
I have installed the Hortonworks Sandbox. I'm trying to load a twitter JSON file and perform some queries on the file, but I'm currently stuck in the loading file part.
I know that I should use the Elephant-bird in order to load a JSON file (without specifying the JSON schema) with JsonLoader(), so I've downloaded the Elephant-bird from the git repo and I've included the jar file
Elephant-bird\repo\com\twitter\elephant-bird\2.2.3\elephant-bird-2.2.3.jar
inside the Hortonworks Sandbox. Here a screen shot with my Pig script:
REGISTER elephant-bird-2.2.3.jar;
Json1 = LOAD 'JSON/sample.tweets' JsonLoader();
DESCRIBE Json1;
STORE Json1 INTO 'tweeterOutput';
Unfortunately I cannot get any results from this script execution. I've tried with both STORE
and DUMP
commands.
Probably I'm doing many wrong things in this process flow, so any help will be appreciated!