2
votes

I am having an issue regarding the binary data i see in Hive tables when querying a table stored as Sequencefile format.

I used Sqoop to import data from Databse specified following options:

--as-sequencefile --fields-terminated-by '\001' --null-string '\\N' --null-non-string '

created a Hive external table to point to the location i imported DB Data:

CREATE EXTERNAL TABLE if not exists Test(
test_id string,
s_date timestamp)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS sequencefile 
LOCATION '<location where i importedsqoop data>

I was thinking that my Hive tabe would deserialize data and display the data in a readable format but i see data as a binary or non-readable foramt.

Do i need any more steps to be followed in order for hive to deserialize the data?

Thank you. Nish.

2

2 Answers

2
votes

It appears that the sequence file output of Sqoop is not compatible with the Hive default SerDe for sequence file. There is a Github project Hive-Sqoop-Serde that might be what you need.

1
votes

You will have to declare input and output formats as well. Create the table like this:

CREATE EXTERNAL TABLE if not exists Test(
test_id string,
s_date timestamp)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS sequencefile 
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION '<location where i importedsqoop data>