1
votes

I have a problem with Hive on spark, when I run the simple query like

select * from table_name

on hive console every thing works well, but when I executing

select count(*) from table_name 

the query terminates with the following error:

Query ID = ab_20160515134700_795fc14c-e89b-4172-bcc6-0cfcffadcd88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = d5e1856e-de67-4e2d-a914-ca1aae324b7f
Status: SENT
Failed to execute spark task, with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

versions:

hadoop-2.7.2
apache-hive-2.0.0
spark-1.6.0-bin-hadoop2
scala: 2.11.8

I have set: spark.master in hive-site.xml And now I get: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client '8ffe7ea3-aaf4-456c-ae18-23c572a766c5'. Error: Child process exited before connecting back at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:101) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.(RemoteHiveSparkClient.java:94) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:106) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:158) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:101) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1840) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1584) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1361) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:778) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645) [hive-cli-2.0.0.jar:2.0.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_77] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_77] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_77] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_77] at org.apache.hadoop.util.RunJar.run(RunJar.java:221) [spark-assembly-1.6.0-hadoop2.6.0.jar:1.6.0] at org.apache.hadoop.util.RunJar.main(RunJar.java:136) [spark-assembly-1.6.0-hadoop2.6.0.jar:1.6.0] Caused by: java.lang.RuntimeException: Cancel client '8ffe7ea3-aaf4-456c-ae18-23c572a766c5'. Error: Child process exited before connecting back at org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:180) ~[hive-exec-2.0.0.jar:2.0.0] at org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450) ~[hive-exec-2.0.0.jar:2.0.0] at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_77] 16/05/16 18:00:33 [Driver]: WARN client.SparkClientImpl: Child process exited with code 1


I have built Spark 1.6.1 and hive 2.0.0 so the error has been changed to:

Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Iterable
    at org.apache.hadoop.hive.ql.parse.spark.GenSparkProcContext.<init>(GenSparkProcContext.java:163)
    at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:195)
    at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:258)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10861)
    at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:239)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1253)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
1
Can you please paste full stack trace? - Ram Ghadiyaram
That means your jar files are not consistent, i.e, not compiled with the same code. please go through the link similar kind of problem was discussed here issues.apache.org/jira/browse/HIVE-9970 - Ram Ghadiyaram

1 Answers

0
votes

I went into the same issue as you on Hive 2.0.0 and Spark 1.6.1. As said before, it had been discussed at issues.apache.org/jira/browse/HIVE-9970.

Having that said, for Hive:

  1. Download Hive source package
  2. Set the right Hadoop/Spark/Tez versions at pom.xml
  3. Expand memory limits for maven. I use export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
  4. Build Hive usign Maven: mvn clean package -Pdist -DskipTests
  5. Result at: packaging/target/apache-hive-2.x.y-bin. Configure hive-site.xml.

For Spark:

  1. Download Spark source package
  2. Set the right Hadoop version at pom.xml
  3. Build Spark without Hive using ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"
  4. Result at dist/. Configure spark-defaults.conf.

Since you've built Spark without Hadoop, you'll need to include Hadoop package jars path to $SPARK_DIST_CLASSPATH. See this documentation page. Additionally, you can read the Hive on Spark guide as a reference.