I've just started to experiment with Hadoop/HBase/Pig so I'm really new at this, but I can't seem to find straightforward info regarding a problem I'm encountering and I'm absolutely stuck.
I'm trying to load data from HBase using Pig but I'm getting the error:
Pig script failed to validate: java.lang.RuntimeException: could not instantiate 'org.apache.pig.backend.hadoop.hbase.HBaseStorage' with arguments '[info:*]'
When this code runs:
raw = LOAD 'hbase://testTable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:*') as (id:int);
From what I found it might be that I'm not registering some jar or maybe something to do with the HBase / Pig versions. Here's the complete script:
REGISTER /usr/local/hbase-1.1.2/lib/hbase-common-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-client-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-server-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-protocol-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/htrace-core-3.1.0-incubating.jar
REGISTER /usr/local/hbase-1.1.2/lib/zookeeper-3.4.6.jar
REGISTER /usr/local/hbase-1.1.2/lib/guava-12.0.1.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-hadoop2-compat-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-annotations-1.1.2.jar
REGISTER /usr/local/hbase-1.1.2/lib/hbase-thrift-1.1.2.jar
set hbase.zookeper.quorum 'localhost'
raw = LOAD 'hbase://testTable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:*') as (id:int);
And the versions I'm using:
- Hadoop 2.5.1
- HBase 1.1.2
- Pig 0.15.0
Any insight on where the problem might be and what I can check would be greatly appreciated!
Update
So in .bashrc I have set the hbase_home to the home folder of the HBase as:
export HBASE_HOME=/usr/local/hbase-1.1.2
I found that this error might happen because Pig is using jars for HBase which are a different version. I looked in the /pig/lib/h2 folder and found there are HBase jars for 0.98.12, but I have 1.1.2 installed. Would you just replace those files in the Pig folder with the ones from the HBase folder in that case?
I tried to change the HBASE_HOME path in .bashrc to the Pig folder, and when I ran the script it seemed to get submitted as a MapReduce job but then failed with an error:
ClassNotFoundException: org.apache.htrace.Trace
Any insight on this?