Hive export to AVRO not having column names in the schema

Question

I tried creating an table in Hive and wanted to export it as Avro format.

Eventually I want to load this avro file to Google BigQuery. For some reason after the export the AVRO schema is not having the correct column names.

create table if not exists test_txt (id int, name varchar(40)); 
insert into test values (1, "AK");
insert overwrite directory "/tmp/test" stored as avro select * from test;
!sh hadoop fs -cat /tmp/test/*;

Output should have the column name as id, name but translated as _col0, _col1.

Objavro.schema▒{"type":"record","name":"baseRecord","fields":[{"name":"_col0","type":["null","int"],"default":null},{"name":"_col1","type":["null",{"type":"string","logicalType":"varchar","maxLength":40}],"default":null}]}▒Bh▒▒δ*@▒x~AK▒Bh▒▒δ*@▒x~

Thanks,

AK

Also it is the same output for tables stored as AVRO as well !!! — AKS

AKS AKS · Accepted Answer · 2018-10-15T15:49:09

If an avro binary file needs to be exported to a single file for further ingestion (in my context to BigQuery) then dont use hadoop cat / insert overwrite statements. Use avro-tools and concat to a big avro file.

hadoop jar avro-tools-1.8.2.jar concat /tmp/test_avro/* big_avro_table.avro

Hive export to AVRO not having column names in the schema

2 Answers