On using complex types like struct, it is recommended to use a unique delimiter for collection than the one used for fields (columns).
Consider a csv file in below format where “,” comma separator is used.
Input.csv
Code, area_name,num,total,perc,num,total,perc,num,total,perc
1100,Albury,90,444,17.4,73,546,13.4,86,546,15.8
1111,armid,40,404,14.4,97,701,13.8,76,701,10.8
Expected result is to create a complex type out of fields (num, total and perc):
1100,Albury,struct<90,444,17.4>,struct<73,546,13.4>,struct<86,546,15.8>
1111,armid, struct<40,404,14.4>, struct<97,701,13.8>,struct<76,701,10.8>
when we try to create a complex type out of fields (num, total and perc) in this case using following hive query, we will get multiple null values in the table since the same “,” comma delimiter is used for both fields and collections, so Hive query failed to segregate the data as we required.
Hive> create table aus_aboriginal( code int, area_name string, male_0_4 STRUCT<num:double, total:double, perc:double>, male_5_9 STRUCT<num:double, total:double, perc:double>, male_10_14 STRUCT<num:double, total:double, perc:double>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ',' LOCATION '/csv';
Output:
1100 Albury {"num":90.0,"total":null,"perc":null} {"num":444.0,"total":nul
l,"perc":null} {"num":17.4,"total":null,"perc":null}
1111 armid {"num":40.0,"total":null,"perc":null} {"num":404.0,"total":nul
l,"perc":null} {"num":14.4,"total":null,"perc":null}
Time taken: 0.15 seconds, Fetched: 2 row(s)
I am suspecting that you are facing this problem.
Usage of Struct
Now consider the input file having data in below format, where “,” comma delimiter is used for fields and for collection items “#” is used as delimiter.
1100,Albury,90#444#17.4,73#546#13.4,86#546#15.8
1111,armid,40#404#14.4,97#701#13.8,76#701#10.8
In this case, we can successfully create a table with complex type by specifying # as delimiter for collection items and , for fields. Please check below hive query.
hive> create table aus_aboriginal( code int, area_name string, male_0_4 STRUCT<num:double, total:double, perc:double>, male_5_9 STRUCT<num:double, total:double, perc:double>, male_10_14 STRUCT<num:double, total:double, perc:double>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '#' LOCATION '/csv';
Output:
hive> select * from aus_aboriginal;
1100 Albury {"num":90.0,"total":444.0,"perc":17.4} {"num":73.0,"total":546.
0,"perc":13.4} {"num":86.0,"total":546.0,"perc":15.8}
1111 armid {"num":40.0,"total":404.0,"perc":14.4} {"num":97.0,"total":701.
0,"perc":13.8} {"num":76.0,"total":701.0,"perc":10.8}
Time taken: 0.146 seconds, Fetched: 2 row(s)
Similar approach should be taken for other complex types as well, refer below link for more information.
Reference:
http://edu-kinect.com/blog/2014/06/16/hive-complex-data-types-with-examples/