0
votes

I have a distributed hadoop cluster with hbase running on its hdfs. To build a map/reduce job using hbase I include these dependencies:

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.1</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.2.3</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.2.3</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

I'm trying to figure out how exactly to deploy all those libraries. Should hadoop just include $HBASE_HOME/lib/* on its classpath? There are a LOT of overlaps and version conflicts. It seems like I should only need some subset, but hbase documentation only offers a little help:

Replace the Hadoop Bundled With HBase! Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its lib directory. The bundled jar is ONLY for use in standalone mode. In distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster. Hadoop version mismatch issues have various manifestations but often all looks like its hung up.

I can't find where it tells you affirmatively what hbase libraries you need to add to hadoop's compute nodes.

1

1 Answers

0
votes

I tried to answer this question experimentally. The minimum set of items I seem to need to make it work is this:

hbase-client-1.2.3.jar -> ../../../../hbase/lib/hbase-client-1.2.3.jar
hbase-common-1.2.3.jar -> ../../../../hbase/lib/hbase-common-1.2.3.jar
hbase-hadoop2-compat-1.2.3.jar -> ../../../../hbase/lib/hbase-hadoop2-compat-1.2.3.jar
hbase-hadoop-compat-1.2.3.jar -> ../../../../hbase/lib/hbase-hadoop-compat-1.2.3.jar
hbase-prefix-tree-1.2.3.jar -> ../../../../hbase/lib/hbase-prefix-tree-1.2.3.jar
base-protocol-1.2.3.jar -> ../../../../hbase/lib/hbase-protocol-1.2.3.jar
hbase-server-1.2.3.jar -> ../../../../hbase/lib/hbase-server-1.2.3.jar
metrics-core-2.2.0.jar -> ../../../../hbase/lib/metrics-core-2.2.0.jar

To explain slightly, my hadoop installation is in /home/hadoop and my hbase installation is in /home/hbase. Those were both "installed" simply by unzipping the apache hadoop and hbase tarballs into the "hadoop" and "hbase" users' home directories, respectively. The reducer is an empty TableReducer.

It seems to run - or at least it doesn't have any ClassNotFound exceptions relating to hbase.

I'm not sure if this is the real set ... I can only say that my empty TableReducer needs exactly those things.