0
votes

I am running 'Hive on Spark' with hive v2.3.3 and Spark v2.0.0 running in spark standalone mode with no yarn. My hive tables are external pointing to S3. My hive-site has spark.submit.deployMode set to client. spark.master set to spark://actualmaster:7077 and in the spark ui I see the spark master has available worker with resources.

In beeline I run select * from table; This works. Then in beeline I run select count(*) from table; and i get error below:

/usr/lib/apache-hive-2.3.3-bin/lib/hive-exec-2.3.3.jar contains the so called missing class and hive2 is started with nohup $HIVE_HOME/bin/hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console &>> $HIVE_HOME/logs/hiveserver2.log &

Below error is from viewing the 'job' in the sparkUI:

Failed stageid0: mapPartitionsToPair at MapTran.java:40

java.lang.NoClassDefFoundError: Lorg/apache/hive/spark/counter/SparkCounters;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
    at java.lang.Class.getDeclaredField(Class.java:2068)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1803)
    at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:79)
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494)
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:482)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:482)
    at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:379)
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:669)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1875)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1744)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2032)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:426)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.spark.counter.SparkCounters
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 45 more

Note that if I do this in beeline (set spark.master=local;) then the count(*) works. What am I missing to make it work without setting spark.master to local ?

1
what is your configuration for hive.metastore.warehouse.dir? on S3 or HDFS?jxc
S3 warehouse dirtooptoop4
I had a similar issue when setting up Hive on spark with GlusterFS which I mounted on all workers with the same path. In my case, I don't have to run spark with local mode, but only the worker on the same server as the Hive2Server actually worked, apps submitted to all other workers got NullPointer error. From my googling web, Hive have a strong tie with the Hadoop system. the new file system might have to support their APIs. I think GlusterFS is just not there. not sure about S3. Also the error messages you had is different from what I had.jxc

1 Answers

0
votes

Please try giving the hive jars path in the spark submit command and check Bcoz the same happened with me. So if that works then do check the spark conf files. You might have not pointed to hive jars correctly.