4
votes

I'm using Java(1.8) based application to create parquet file with using libraries org.apache.avro.Schema and org.apache.parquet.hadoop.ParquetWriter etc..

This is my sample code

final String schemaLocation=ParquerWriterImpl.class.getClassLoader().getResource("parquet-schema/" + ParquetTypes.RISKINFO.getFileType()).getPath();

        Schema avroSchema = new Schema.Parser().parse(new File(schemaLocation));

        final MessageType parquetSchema = new AvroSchemaConverter().convert(avroSchema);
        final WriteSupport<Mapper> writeSupport = new AvroWriteSupport(parquetSchema, avroSchema);
        final String parquetPath = PropertyLoader.getPropertyLoader().getProperty(Constants.PROPERTY_MACHINE_FOLDERPATH) + "/" +
                parquetFileName;
        final Path path = new Path(parquetPath);
        ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, 1024);
        final GenericRecord record = new GenericData.Record(avroSchema);
        parquetWriter.write(function.apply(new RiskInfoGen(record)));
        parquetWriter.close();  

to create this file I'm using Avro schema like below.

},
  {
     "name": "additional",
    "type": {"type": "map", "values": "string","default" : null}

  },

  {
     "name": "mydate",
     "type": {"type": "int", "logicalType" : "date"}

  }

in POJO class I'm mapping "mydate" to Java int type.

Question 1 Although I'm getting parquet file here, after opening that file with spark, "mydate" column showing as int type not expected date type.

please let me know how to make this "mydate" as date column in parquet schema.

e.g mydate date (nullable true)

1

1 Answers

0
votes

I had the same problem. I was using Parquet-Avro 1.8.1

Switching to 1.9.0 fixed it for me:

{"name": "birth_date", "type": [{"type": "int", "logicalType" : "date"}, "null"]}"

And I set the value as:

record.put("birth_date", 1);

And it shows up as 1970-01-02 in apache spark dataframe.