1
votes

I'm using the hadoop java api to create an ORCFile outside of a mapreduce job. The Writer class for ORCFiles have a method addRow(Object row). My question is what type should I use for the "row" parameter if I want to be able to query this file using hive?

https://hive.apache.org/javadocs/r0.12.0/api/org/apache/hadoop/hive/ql/io/orc/Writer.html

1

1 Answers

0
votes

Creating ORC files in java is mostly a question of writing a lot of boilerplate code. If you look at the apache-orc test files, you can get a good sense of how to do so. I just released an open-source library that takes care of all the boilerplate using runtime code generation and makes it trivial to serialize java objects to ORC. The library is available via github. The library is called eclectic-orc.