0
votes

I have a Map Reduce job that uses OOZIE workflow xml and writes output in sequence file format(org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat), is there something like this for saving in Parquet format also ?, I could not find any under https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/package-summary.html

or should I be using a different approach ?

Please advise .

Thanks

1
Yes, you can use parquet-mr dependency, but why not use Spark to write Parquet? - OneCricketeer

1 Answers

0
votes

Oozie shouldn't control what libraries get included in your MapReduce job(s).

ParquetOutputFormat is not built into Hadoop.

You can find it here.

Maven target

<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-hadoop</artifactId>
    <version>${parquet.version}</version>
</dependency>