0
votes

I am trying to move from text format to ORC format for some very large Hive tables with the hope that it would take less storage and querying would be faster. Since we use LZO as the compression across our estate I was trying to use that for the ORC format as well.

I would be grateful for any help you guys can provide.

I am using Hadoop 2.4.0 and Hive 0.13.1

As per the links below, it seems that it should be possible to use LZO compression with ORC format:

http://2013.berlinbuzzwords.de/sessions/orc-file-improving-hive-data-storage https://hive.apache.org/javadocs/r1.1.0/api/ql/org/apache/hadoop/hive/ql/io/orc/package-summary.html

But when I do something like

create table sa_orc_lzo 
stored as orc  tblproperties ("orc.compress"="LZO")  
as select * from sa;

I get the error below:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"data".......}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)

**Caused by: java.lang.IllegalArgumentException: LZO is not available.
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.createCodec**(WriterImpl.java:200)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:175)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:83)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:649)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
**Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.orc.LzoCodec**
1
Can someone please help me with this? Any guidance is much appreciated.deepto

1 Answers

0
votes

Unfortunately, you can't. Someone needs to implement org.apache.hadoop.hive.ql.io.orc.LzoCodec and it hasn't been done yet. I put the hooks in when I wrote ORC and I intended to get back to it and I haven't yet. At the moment, I've got way too much on my plate. My suggestion is to use Snappy instead. Snappy is almost as fast as LZO and works great.

Part of the reason that it hasn't been done is that LZO has is GPL'ed and thus any bindings to it have to be released under the GPL. Since the Apache license is incompatible with the GPL, the binding can't be hosted at Apache.