I am using cascading to convert Text Delimited to parquet & avro files. I am not able to provide description for columns in parquet metadata the same way way Avro has it. This will be helpful when anyone is using the data set to get some description about the field in the data set itself.
Below is the Parquet Schema:
message LaunchApplication {
required int field1;
required binary field2;
optional binary field3;
required binary field4;
}
Below is the avro schema:
{ "type":"record", "name":"CascadingAvroSchema", "namespace":"", "fields":[
{"name":"field1","type":"int","doc":"10,NOT NULL, KeyField"},
{"name":"field2","type":"string","doc":"5,NOT NULL, FLAG, Indicator},
{"name":"field3","type":["null","string"],"doc":"20,NULL, System Field."},
{"name":"field4","type":"string","doc":"20,NOT NULL,MM/DD/YYYY,Record Changed Date."} ]
}
How do i keep track of the "doc" section in the avro file in parquet as well ?