Writing snappy compressed data to a hive table

Question

I've created a hive table and now I want to load snappy compressed data into the table. Therefore I did the following:

SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET hive.exec.compress.output=true;
SET mapreduce.output.fileoutputformat.compress=true;
CREATE TABLE toydata_table (id STRING, value STRING)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";'

Then I created as CSV file called toydata.csv that has the following content:

A,Value1
B,Value2
C,Value3

I compressed this file with snzip ( https://github.com/kubo/snzip ) by doing

/usr/local/bin/snzip -t snappy-java toydata.csv

which produces toydata.csv.snappy. After having done this I returned to the hive cli and loaded the data by LOAD DATA LOCAL INPATH "toydata.csv.snappy" INTO TABLE toydata_table;. But now I want to try to query from that table and get the following error message:

hive> select * from toydata_table;
OK
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
    at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:189)
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:175)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:433)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:515)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

I did the exact same thing with gzip and using gzip works fine. So, why does this part fail?

Looks you have not installed snappy in your cluster. Which version of your hadoop and hive? You can check here to install snappy for your cluster: code.google.com/p/hadoop-snappy — zsxwing
I'm using hadoop 2.2 and hive 0.12. I did install all the stuff on my cluster and restarted it. However I still get the same error message. — toom

Sachin Janani Sachin Janani · Accepted Answer · 2014-08-07T18:11:34

Please install snappy compression codec on your cluster.If you want to confirm whether snappy is installed please find libsnappy.so file in your libraries. Also you need to start hive shell with --auxpath parameter and provide snappy.jar.e.g: hive --auxpath /home/user/snappy1.0.4.1.jar.

Writing snappy compressed data to a hive table

1 Answers