0
votes

I want to export data from server to hive. I have a 3 level nested data in form of java classes. I was successfully able to create a avro schema using Avro Tools ReflectData and write out the data in avro files using ReflectDatumWriter. In Hive I was able to create a table and specified the schema using the

TBLPROPERTIES 
  ('avro.schema.url'='hdfs:///schema.avsc');

I can see there are way to export the same data in parquet format http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/

Let say I get that done and have same data in parquet files .. How do I query this export parquet data in Hive ? But how i specify the schema for hive ? I don't want to write a huge create table statement in hive with the whole nested schema. How do i specify null values for some members in schema ? I there a way I can directly create a parquet schema like avro schema and give to Hive using create table statement ?

1

1 Answers

0
votes

To query data in hive you can create hive external table and specify location of the file. Like this

CREATE EXTERNAL TABLE XXX (...) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" LOCATION '/you/location/here';

There is no way to not specify this statement, because generation of file independent from hive metastore and all you can do with AVRO is just generate data file