0
votes

I have a set of parquet files that i would like to add to my HIVE in HDInsight.

I have created a parquet table (simplified here of course):

create external table parq_test ( 
  A int,
  B int,
  C int
  )
STORED AS PARQUET
LOCATION '/data/parq_test'

I can insert data into this file:

insert into parq_test values ( 1,2,3 );

The file produced by hive in this folder has the following parquet schema:

message hive_schema {
  optional int32 a;
  optional int32 b;
  optional int32 c;
}

If i copy in other files with schema with the same shape:

message hive_schema {
  optional int32 a;
  optional int32 b;
  optional int32 c;
}

I get the following error:

      org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.IllegalStateException: Group type [message schema {
  optional int32 a;
  optional int32 b;
  optional int32 c;
}
] does not contain requested field: optional int32 a

I am confused by this error as it very clearly does contain the requested field? Is it not possible to add parquet files directly to the external table directory?

Edit: edit.

1
How do you get the additional files schema?David דודו Markovitz
i sue parquet-tools to list the schemas. My other parquet-files are created by fast-parquet python library.Christian Sloper

1 Answers

0
votes

I ran into similar issue as well. On further research found "pyarrow" is the default engine for parquet on python. After using this engine to create parquet file in python, I was able to query them in Hive