1
votes

I have an csv file and its column can contain ',' bold in sample

sample

23,"we,are",100

23,"you,are",100

Requirement is load to an hive table (col1 int ,col2 array, col3 int) ;

1

1 Answers

1
votes

If your Hive version is 0.14 and above you can use CSV Serde (https://cwiki.apache.org/confluence/display/Hive/CSV+Serde). DEFAULT_QUOTE_CHARACTER for this SerDe is "

If you have previous Hive version, try to add this serde manually https://github.com/ogrodnek/csv-serde

The thing is Serde will treat your array as a string. This is not very big problem, you can convert column into array when doing select or create additional view for the same.

Example:

DROP TABLE my_table;
CREATE EXTERNAL TABLE my_table(col1 int , col2 string, col3 int)
row format SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
stored as textfile;

I created the text file and put it in the table location.

File content:

23,"we,are",100
23,"you,are",100

Now, get the data:

hive> select col1, split(col2,",") as col2, col3 from my_table;
OK
23      ["we","are"]    100
23      ["you","are"]   100

Alternatively you can create a view:

hive> create view my_table_view as select col1, split(col2,",") as col2, col3 from my_table;
OK
Time taken: 0.427 seconds
hive> select * from my_table_view;
OK
23      ["we","are"]    100
23      ["you","are"]   100
Time taken: 0.369 seconds, Fetched: 2 row(s)

--Select array elements:

hive> select col1,col2[0] as col2_1, col2[1] as col2_2, col3 from my_table_view;
OK
23      we      are      100
23      you     are     100
Time taken: 0.09 seconds, Fetched: 2 row(s)