can OOZIE Map-Reduce job save data in parquet format?

Question

I have a Map Reduce job that uses OOZIE workflow xml and writes output in sequence file format(org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat), is there something like this for saving in Parquet format also ?, I could not find any under https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/package-summary.html

or should I be using a different approach ?

Please advise .

Thanks

Yes, you can use parquet-mr dependency, but why not use Spark to write Parquet? — OneCricketeer

OneCricketeer OneCricketeer · Accepted Answer · 2021-08-05T15:30:47

Oozie shouldn't control what libraries get included in your MapReduce job(s).

ParquetOutputFormat is not built into Hadoop.

You can find it here.

Maven target

<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-hadoop</artifactId>
    <version>${parquet.version}</version>
</dependency>

can OOZIE Map-Reduce job save data in parquet format?

1 Answers