3
votes

I'm running an EMR 5.0 cluster and I'm using HUE to create an OOZIE workflow to submit a SPARK 2.0 job. I have ran the job with a spark-submit directly on the YARN and as a step on the same cluster. No problem. But when I do it with HUE I get the following error:

java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SessionState':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:949)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:111)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:110)
    at org.apache.spark.sql.SparkSession.conf$lzycompute(SparkSession.scala:133)
    at org.apache.spark.sql.SparkSession.conf(SparkSession.scala:133)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:838)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:838)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:838)
    at be.infofarm.App$.main(App.scala:22)
    at be.infofarm.App.main(App.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:946)
    ... 19 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SharedState':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:949)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:100)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:100)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:99)
    at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:98)
    at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:153)
    ... 24 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:946)
    ... 30 more
Caused by: java.lang.Exception: Could not find resource path for Web UI: org/apache/spark/sql/execution/ui/static
    at org.apache.spark.ui.JettyUtils$.createStaticHandler(JettyUtils.scala:182)
    at org.apache.spark.ui.WebUI.addStaticHandler(WebUI.scala:119)
    at org.apache.spark.sql.execution.ui.SQLTab.<init>(SQLTab.scala:32)
    at org.apache.spark.sql.internal.SharedState$$anonfun$createListenerAndUI$1.apply(SharedState.scala:96)
    at org.apache.spark.sql.internal.SharedState$$anonfun$createListenerAndUI$1.apply(SharedState.scala:96)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.sql.internal.SharedState.createListenerAndUI(SharedState.scala:96)
    at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:44)
    ... 35 more

When I don't use spark.sql or the SparkSession (instead I used SparkContext) in my Spark job it runs fine. If anyone has any clue what is going on I would be very grateful.

EDIT 1

My maven assembly

  <build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
  <plugin>
    <groupId>net.alchim31.maven</groupId>
    <artifactId>scala-maven-plugin</artifactId>
    <version>3.1.3</version>
    <executions>
      <execution>
        <goals>
          <goal>compile</goal>
          <goal>testCompile</goal>
        </goals>
        <configuration>
          <args>
            <arg>-dependencyfile</arg>
            <arg>${project.build.directory}/.scala_dependencies</arg>
          </args>
        </configuration>
      </execution>
    </executions>
  </plugin>

  <plugin>
    <artifactId>maven-assembly-plugin</artifactId>
    <configuration>
      <archive>
        <manifest>
          <mainClass>be.infofarm.App</mainClass>
        </manifest>
      </archive>
      <descriptorRefs>
        <descriptorRef>jar-with-dependencies</descriptorRef>
      </descriptorRefs>
    </configuration>
    <executions>
      <execution>
        <id>make-assembly</id> <!-- this is used for inheritance merges -->
        <phase>package</phase> <!-- bind to the packaging phase -->
        <goals>
          <goal>single</goal>
        </goals>
      </execution>
    </executions>
  </plugin>
</plugins>

1
It has something to do with how you build your fat jar. The JettyUtils is not able to retrieve files from the static package org/apache/spark/sql/execution/ui/static. Can you provide your maven assembly plugin code?RudyVerboven
For plugin code check editT. Bombeke

1 Answers

1
votes

when you run jar with spark-submit all dependant jars are available on the classpath of the machine but when you execute the same using oozie those jars are not available in Oozie's 'sharelib'. you can check the same by executing following command

oozie admin -shareliblist spark

Step 1. Upload required jars from local machine to HDFS

hdfs dfs -put /usr/lib/spark/jars/*.jar /user/oozie/share/lib/lib_timestamp/spark/ 

just uploading jars to HDFS won't add them to sharelib you need to update sharelib by executing

oozie admin -sharelibupdate

hope this helps