I want to export data from server to hive. I have a 3 level nested data in form of java classes. I was successfully able to create a avro schema using Avro Tools ReflectData and write out the data in avro files using ReflectDatumWriter. In Hive I was able to create a table and specified the schema using the
TBLPROPERTIES
('avro.schema.url'='hdfs:///schema.avsc');
I can see there are way to export the same data in parquet format http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/
Let say I get that done and have same data in parquet files .. How do I query this export parquet data in Hive ? But how i specify the schema for hive ? I don't want to write a huge create table statement in hive with the whole nested schema. How do i specify null values for some members in schema ? I there a way I can directly create a parquet schema like avro schema and give to Hive using create table statement ?