I have an csv file and its column can contain ',' bold in sample
sample
23,"we,are",100
23,"you,are",100
Requirement is load to an hive table (col1 int ,col2 array, col3 int) ;
If your Hive version is 0.14 and above you can use CSV Serde (https://cwiki.apache.org/confluence/display/Hive/CSV+Serde). DEFAULT_QUOTE_CHARACTER for this SerDe is "
If you have previous Hive version, try to add this serde manually https://github.com/ogrodnek/csv-serde
The thing is Serde will treat your array as a string. This is not very big problem, you can convert column into array when doing select or create additional view for the same.
Example:
DROP TABLE my_table;
CREATE EXTERNAL TABLE my_table(col1 int , col2 string, col3 int)
row format SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
stored as textfile;
I created the text file and put it in the table location.
File content:
23,"we,are",100
23,"you,are",100
Now, get the data:
hive> select col1, split(col2,",") as col2, col3 from my_table;
OK
23 ["we","are"] 100
23 ["you","are"] 100
Alternatively you can create a view:
hive> create view my_table_view as select col1, split(col2,",") as col2, col3 from my_table;
OK
Time taken: 0.427 seconds
hive> select * from my_table_view;
OK
23 ["we","are"] 100
23 ["you","are"] 100
Time taken: 0.369 seconds, Fetched: 2 row(s)
--Select array elements:
hive> select col1,col2[0] as col2_1, col2[1] as col2_2, col3 from my_table_view;
OK
23 we are 100
23 you are 100
Time taken: 0.09 seconds, Fetched: 2 row(s)