0
votes

I have an error when I try to load or save my data from Apache Pig into anything but CSV. Here is my pig code:

REGISTER /usr/local/Cellar/pig/0.15.0/libexec/*.jar
REGISTER /usr/local/Cellar/pig/0.15.0/libexec/lib/*.jar
REGISTER /usr/local/Cellar/hbase/1.1.2/libexec/lib/*.jar
REGISTER /usr/local/Cellar/hadoop/2.7.2/libexec/share/hadoop/common/*.jar
REGISTER /usr/local/Cellar/hadoop/2.7.2/libexec/share/hadoop/common/lib/*.jar
REGISTER /usr/local/Cellar/hadoop/2.7.2/libexec/share/hadoop/mapreduce/*.jar
REGISTER /usr/local/Cellar/hadoop/2.7.2/libexec/share/hadoop/mapreduce/lib/*.jar
REGISTER /usr/local/Cellar/pig/0.15.0/libexec/lib/piggybank.jar

--generated_data = LOAD 'tableHive' USING org.apache.hive.hcatalog.pig.HCatLoader(',') AS (level:chararray, score:INT, attraction:chararray);

generated_data = LOAD 'CSVResults/ireland.csv' USING PigStorage(',') AS (level:chararray, score:INT, attraction:chararray);
DUMP generated_data;
fiveRating = FILTER generated_data BY (float)score>4;

level6 = FILTER fiveRating BY (float)level>5;

groupedbylevel = group level6 by attraction;

countAttractions = FOREACH groupedbylevel {
    level6Attractions = CROSS level6.level;
    generate group, COUNT(level6Attractions) AS listBylevel6;
};

orderlist = ORDER countAttractions BY listBylevel6 DESC;

limitorder = LIMIT orderlist 20;

STORE limitorder into 'Level6AttractionsIreland-limited2' using PigStorage(',');

STORE countAttractions into 'hbase://Level6AttractionsIreland' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('Ireland:level Ireland:score Ireland:attraction');
STORE countAttractions INTO 'Level6AttractionsIreland' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(',');

and here is the error from the pig log file:

Pig Stack Trace
---------------
ERROR 2999: Unexpected internal error. java.io.IOException: java.lang.reflect.InvocationTargetException

java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:211)
    at org.apache.pig.backend.hadoop.hbase.HBaseStorage.getOutputFormat(HBaseStorage.java:928)
    at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:69)
    at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
    at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
    at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
    at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:212)
    at org.apache.pig.PigServer$Graph.compile(PigServer.java:1767)
    at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1443)
    at org.apache.pig.PigServer.execute(PigServer.java:1356)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:415)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:398)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
    at org.apache.pig.Main.run(Main.java:631)
    at org.apache.pig.Main.main(Main.java:177)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:459)
    at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:436)
    at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:317)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:198)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:160)
    at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:206)
    ... 25 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:457)
    ... 30 more
Caused by: java.lang.NoClassDefFoundError: org/cloudera/htrace/Trace
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:218)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:481)
    at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
    at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:907)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:701)
    ... 35 more
Caused by: java.lang.ClassNotFoundException: org.cloudera.htrace.Trace
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 41 more
================================================================================

As you can see I've tried to add every jar file that might be relevant, and I've also removed everything but the load and store commands to see if it is the code that is causing it, but I get the same results. I am very new to Pig so apologies if this is a silly mistake, I have searched for answers elsewhere, but nothing is working for me at this stage. Also I am on a mac with Hadoop, HBase and Hive installed locally and I am running the command 'pig -x local test.pig' in terminal. Any advice would be great, thanks!

1
Caused by: java.lang.NoClassDefFoundError: org/cloudera/htrace/Trace... Does that mean anything to you? - OneCricketeer
Thanks for your reply, to be honest, none of it does really. I did install cloudera's VM at one stage, I was going to use for a college project, but I'm nervous about uninstalling it in case I break something. Do you think the VM could be causing a problem? I saw that error, but I thought it caused by something closer to the top of the errors, if you know what I mean. - user5502557
It's a VM, so you can't break anything. The root cause of an error in a stacktrace is at the bottom. That's why it's called a "stack" trace. Anything after the error is added "on top" of the others. - OneCricketeer
Are you using the cloudera or hortonworks VM? Based on the answers in this post, you are missing some HBase jar file(s). I'm not sure why you are running the Pig code on your Mac (based on /usr/local/Cellar). You should run code from the VM, where all the environment variables and paths are correctly setup. - OneCricketeer
No, I'm not using a VM at all, I've installed Hadoop etc on the Mac, I was just planning on using Cloudera if I'd a problem using it on the Mac, so I installed it, but I'm not using it. I've installed pig, Hbase and Hadoop on the Mac and I'm using that. Sorry if that was confusing! - user5502557

1 Answers

0
votes

I'm not sure where you grabbed your dependencies from, but your code is looking for some Cloudera package that doesn't exist, thus the error.

Caused by: java.lang.ClassNotFoundException: org.cloudera.htrace.Trace

I just downloaded Hbase with brew install hbase, and the correct class file is org.apache.htrace.Trace located in /usr/local/Cellar/hbase/.../htrace-core-*.jar

So, I would recommend downloading the latest version of the HBase libraries.