0
votes

Trying to read Hive files in Pig using http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/HiveColumnarLoader.html

Fies have RCF, SnappyCodec and hive.io.rcfile.column.number words in its beginning, they are binary files. Moreover they are partitioned over multiple directories (like /day=20140701).

However simple script of loading, grouping and counting rows prints nothing to output. If I try to add "ILLUSTRATE" like this:

rows = LOAD ... using HiveColumnarLoader ...;
ILLUSTRATE rows;

I get error like this:

2014-07-17 14:16:43,086 [main] ERROR org.apache.pig.pen.AugmentBaseDataVisitor - No (valid) input data found!
java.lang.RuntimeException: No (valid) input data found!
    at org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:583)
    at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:229)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:82)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:66)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:180)
    at org.apache.pig.PigServer.getExamples(PigServer.java:1180)
...

I'm not sure, whether it is because of Snappy compression or some trouble with specifying schema (I copied it from hive, describe table).

Could anyone please confirm that HiveColumnarLoader works with snappy compressed files or propose another approach?

Thanks in advance!

1

1 Answers

0
votes

Have you tried the HCatLoader?

rows = LOAD 'tablename' using org.apache.hcatalog.pig.HCatLoader();