0
votes

I have a DynamoDB on Amazon containing a bunch of tweets with related data (user, location, etc.). I exported this via pipeline and got a json file. Exporting it to csv would be a bad idea since many of the tweets contain commas in the text fields. As new to Hive as I am, I at least know that to load a json file, I need some kind of SerDe.

This is how I'm creating the table:

create external table tablename (
id string,
created_at string,
followers_count string,
geo string,
location string,
polarity string,
screen_name string,
sentiment string,
subjectivity string,
tweet string,
username string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
SAVE AS TEXTFILE ;

I don't get any errors, but then I do:

load data inpath '/user/exam'
overwrite into table tablename;

(this is where the json file is stored)

When I do "select * from tablename limit 5;" everything comes up NULL:

hive> select * from wcd.tablename limit 5;
OK
{   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
{   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
{   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
{   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
{   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL

If anyone wants to take a look at the file in question, it's available at:

http://www.vaughn-s.net/hadoop

Any assistance would be greatly appreciated!

1
can you put some example of your data? - hlagos
Yes, there's a link to the entire file at the bottom of my post; I can post snippits if you prefer, though. - lengthy_preamble

1 Answers

3
votes

The reason is because your json doesn´t follow your table definition

{"id":{"s":"894643473017561088"},"sentiment":{"s":"neutral"},"subjectivity":{"s":"0.0"},"username":{"s":"Jessi"},"geo":{"s":"None"},"location":{"s":"Valley of the sun☀ï¸"},"polarity":{"s":"0.0"},"tweet":{"s":"b\"RT @bannerite: Donald Trump's lies have consequences. We're seeing them now | Charlotte Observer #DemForce https""},"created_at":{"s":"Mon Aug 07 19:36:40
+0000 2017"},"screen_name":{"s":"JessiAtkins06"},"followers_count":{"s":"19"}}

Try to put each column with a structure with a s string, for example

 id struct<s:string>