0
votes

I am trying the example from https://www.mapr.com/developercentral/code/loading-hbase-tables-spark#.VKtxqivF_fS . The table is gettign created and the rows are inserted when I check through HBase shell. But the the next step of creating the RDD and then the count gives the following error. Help is appreciated.

java.lang.IllegalStateException: unread block data
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode
(ObjectInputStream.java:2421)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
org.apache.spark.serializer.JavaDeserializationStream.readObject
(JavaSerializer.scala:62)
org.apache.spark.serializer.JavaSerializerInstance.deserialize
(JavaSerializer.scala:87)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
1

1 Answers

1
votes

The issue got resolved by using --jars option to pass the HBase jar files to the workers. I was only using --driver-class-path earlier.

like below

spark-submit --master spark://sparkhost:7077\
 --class SimpleApp \
 --jars /home/hadoop/Spark/hbase-0.98.7-hadoop2/lib/hbase-client-0.98.7-hadoop2.jar,\
/home/hadoop/Spark/hbase-0.98.7-hadoop2/lib/hbase-server-0.98.7-hadoop2.jar,\
/home/hadoop/Spark/hbase-0.98.7-hadoop2/lib/hbase-protocol-0.98.7-hadoop2.jar,\
/home/hadoop/Spark/hbase-0.98.7-hadoop2/lib/hbase-common-0.98.7-hadoop2.jar,\
/home/hadoop/BigDataEDW/htrace-core-2.04.jar\
 /home/hadoop/BigDataEDW/hbase-spark_2.10-1.0.0-SNAPSHOT.jar