I have a DynamoDB on Amazon containing a bunch of tweets with related data (user, location, etc.). I exported this via pipeline and got a json file. Exporting it to csv would be a bad idea since many of the tweets contain commas in the text fields. As new to Hive as I am, I at least know that to load a json file, I need some kind of SerDe.
This is how I'm creating the table:
create external table tablename (
id string,
created_at string,
followers_count string,
geo string,
location string,
polarity string,
screen_name string,
sentiment string,
subjectivity string,
tweet string,
username string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
SAVE AS TEXTFILE ;
I don't get any errors, but then I do:
load data inpath '/user/exam'
overwrite into table tablename;
(this is where the json file is stored)
When I do "select * from tablename limit 5;" everything comes up NULL:
hive> select * from wcd.tablename limit 5;
OK
{ NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
{ NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
{ NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
{ NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
{ NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
If anyone wants to take a look at the file in question, it's available at:
http://www.vaughn-s.net/hadoop
Any assistance would be greatly appreciated!