0
votes

I've just started to experiment with Hadoop/HBase/Pig so I'm really new at this, but I can't seem to find straightforward info regarding a problem I'm encountering and I'm absolutely stuck.

I'm trying to load data from HBase using Pig but I'm getting the error:

Pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.backend.hadoop.hbase.HBaseStorage' with arguments '[info:*]'

When this code runs:

raw = LOAD 'hbase://testTable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:*') as (id:int); 

From what I found it might be that I'm not registering some jar or maybe something to do with the HBase / Pig versions. Here's the complete script:

REGISTER /usr/local/hbase-1.1.2/lib/hbase-common-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-client-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-server-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-protocol-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/htrace-core-3.1.0-incubating.jar
REGISTER /usr/local/hbase-1.1.2/lib/zookeeper-3.4.6.jar
REGISTER /usr/local/hbase-1.1.2/lib/guava-12.0.1.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-hadoop2-compat-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-annotations-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-thrift-1.1.2.jar

set hbase.zookeper.quorum 'localhost'

raw = LOAD 'hbase://testTable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:*') as (id:int);

And the versions I'm using:

  • Hadoop 2.5.1
  • HBase 1.1.2
  • Pig 0.15.0

Any insight on where the problem might be and what I can check would be greatly appreciated!

Update

So in .bashrc I have set the hbase_home to the home folder of the HBase as:

export HBASE_HOME=/usr/local/hbase-1.1.2

I found that this error might happen because Pig is using jars for HBase which are a different version. I looked in the /pig/lib/h2 folder and found there are HBase jars for 0.98.12, but I have 1.1.2 installed. Would you just replace those files in the Pig folder with the ones from the HBase folder in that case?

I tried to change the HBASE_HOME path in .bashrc to the Pig folder, and when I ran the script it seemed to get submitted as a MapReduce job but then failed with an error:

ClassNotFoundException: org.apache.htrace.Trace

Any insight on this?

1
Just as a note to anyone that comes across this, I have not been able to solve this problem, but was able to get Pig and HBase working together by running Pig 0.15.0 with HBase 0.98.12.arie

1 Answers

0
votes

I'm having the same problem. I've been taking a look at pig libraries over the Pig Home directory and it seems that on the actual release they only include connectors with hbase 0.98. That's why you should register the HBase libraries.

/pig/lib/h2$ ls avro-mapred-1.7.5-hadoop2.jar hbase-protocol-0.98.12-hadoop2.jar tez-mapreduce-0.7.0.jar commons-collections4-4.0.jar hbase-server-0.98.12-hadoop2.jar tez-runtime-internals-0.7.0.jar hbase-client-0.98.12-hadoop2.jar hive-shims-0.23-0.14.0.jar tez-runtime-library-0.7.0.jar hbase-common-0.98.12-hadoop2.jar tez-api-0.7.0.jar tez-yarn-timeline-history-with-acls-0.7.0.jar hbase-hadoop2-compat-0.98.12-hadoop2.jar tez-common-0.7.0.jar hbase-hadoop-compat-0.98.12-hadoop2.jar tez-dag-0.7.0.jar

Anyway I'm having the same problem. I've also tried to see at the pig 0.15 docs,

https://pig.apache.org/docs/r0.15.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

But it seems that the method is correct.

Josh, did you finally solved it?