I have a distributed hadoop cluster with hbase running on its hdfs. To build a map/reduce job using hbase I include these dependencies:
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.3</version>
<scope>provided</scope>
</dependency>
</dependencies>
I'm trying to figure out how exactly to deploy all those libraries. Should hadoop just include $HBASE_HOME/lib/* on its classpath? There are a LOT of overlaps and version conflicts. It seems like I should only need some subset, but hbase documentation only offers a little help:
Replace the Hadoop Bundled With HBase! Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its lib directory. The bundled jar is ONLY for use in standalone mode. In distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster. Hadoop version mismatch issues have various manifestations but often all looks like its hung up.
I can't find where it tells you affirmatively what hbase libraries you need to add to hadoop's compute nodes.