1
votes

At work we are attempting to do the following:

  • Run Elastic MapReduce jobs via Amazon, which freezes Hadoop at version 0.20.205
  • Write output to HBase running on EC2, specifically, 0.92.1-cdh4.0.1 from Cloudera

What I've discovered so far is my WordCount test appears to work when I use Apache HBase 0.92.1 in my Hadoop job (packaged via maven). I'm scared that this is working accidentally and that may blow up as my usage matures.

However, when I package HBase 0.92.1-cdh4.0.1 in my Hadoop job, I get a ClassNotFoundException:

https://emr-qa.eventbrite.com.s3.amazonaws.com/logs/j-RWJ75VR11SLB/steps/1/stderr

  • Does the Apache HBase jar play nicely with the CDH Hbase server?
  • Is mixing versions and packages like this is horrible idea?
1
Looks like the answer is "no"...from one of the HBase committers: bit.ly/PqwNqDbrianz

1 Answers

1
votes

I had the same problem, and it seems they are not compatible (there's a problem in the connections). The solution is to change the maven dependency to use cloudera's jars:

 <properties>
   <hbase.version>0.92.1-cdh4.0.1</hbase.version>
   <hadoop.version>2.0.0-cdh4.0.1</hadoop.version>
 </properties>

 <repositories>
   <repository>
       <id>cloudera</id>
       <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
 </repositories>    

And later in the dependencies:

 <dependency>
 <groupId>org.apache.hbase</groupId>
      <artifactId>hbase</artifactId>
      <version>${hbase.version} </version>
      <exclusions>
          <exclusion>
              <artifactId>thrift</artifactId>
              <groupId>org.apache.thrift</groupId>
          </exclusion>
      </exclusions>
  </dependency>

You can change the property and repackage when you want to use the code with another distro