5
votes

I have a spark job (written in Scala) that retrieves data from an HBase table found on another server. In order to do this I first create the HBaseContext like this: val hBaseContext:HBaseContext = new HBaseContext(sparkContext, HBaseConfiguration.create())

When I run the spark job I use spark-submit and specify the arguments needed. Something like this:

spark-submit  --master=local[*] --executor-memory 4g --executor-cores 2 --num-executors 2 --jars $(for x in `ls -1 ~/spark_libs/*.jar`; do readlink -f $x; done | paste -s | sed -e 's/\t/,/g') --class com.sparksJob.MyMainClass myJarFile.jar "$@"

The thing is that this connects to zookeeper on localhost, however I want it to connect to the zookeeper on another server (the one where HBase is).

Hardcoding this information works:

val configuration: Configuration = new Configuration()
configuration.set("hbase.zookeeper.quorum", "10.190.144.8")
configuration.set("hbase.zookeeper.property.clientPort", "2181")
val hBaseContext:HBaseContext = new HBaseContext(sparkContext, HBaseConfiguration.create(configuration))

However but I want it configurable.

How can I specify spark-submit the path to an hbase-site.xml file to use?

1
Can you pass zookeeper quorum and port through scala app arguments?maxteneff
from which JAR file did you get the HBaseContext , apart from Ted Malaska git repo .. I could not find this class .. would you be able to share your SBT or POM file it would be very helpful .. github.com/tmalaska/SparkOnHBaseDave
did you ever find a solution? I am stumbling into the same problemrad i

1 Answers

3
votes

You can pass hbase-site.xml as parameter of the --files option. Your example would become:

spark-submit  --master yarn-cluster --files /etc/hbase/conf/hbase-site.xml --executor-memory 4g --executor-cores 2 --num-executors 2 --jars $(for x in `ls -1 ~/spark_libs/*.jar`; do readlink -f $x; done | paste -s | sed -e 's/\t/,/g') --class com.sparksJob.MyMainClass myJarFile.jar "$@"

Notice the master set to yarn-cluster. Any other option would make the hbase-site.xml to be ignored.