1
votes

I'm executing a spark-submit script in an EMR step that has my super JAR as the main class, like

  spark-submit \
    ....
    --class ${MY_CLASS} "${SUPER_JAR_S3_PATH}"

... etc

but Spark is by default loading the jar file:/usr/lib/spark/jars/guice-3.0.jar which contains com.google.inject.internal.InjectorImpl, a class that's also in the Guice-4.x jar which is in my super JAR. This results in a java.lang.IllegalAccessError when my service is booting up.

I've tried setting some Spark conf in the spark-submit to put my super jar in the classpath in hopes of it getting loaded first, before Spark loads guice-3.0.jar. It looks like:

--jars "${ASSEMBLY_JAR_S3_PATH}" \
 --driver-class-path "/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:${SUPER_JAR_S3_PATH}" \
 --conf spark.executor.extraClassPath="/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:${SUPER_JAR_S3_PATH}" \

but this results in the same error.

Is there a way to remove that guice-3.0.jar from the default spark classpath so my code can use the InjectorImpl that's packaged in the Guice-4.x JAR? I'm also running Spark in client mode so I can't use spark.driver.userClassPathFirst or spark.executor.userClassPathFirst

1

1 Answers

2
votes

one way is point to lib where your guice old version of jar is there and then exclude it.

sample shell script for spark-submit :

export latestguicejar='your path to latest guice jar'

   #!/bin/sh
    # build all other dependent jars in OTHER_JARS

JARS=`find /usr/lib/spark/jars/ -name '*.jar'`
OTHER_JARS=""
   for eachjarinlib in $JARS ; do    
if [ "$eachjarinlib" != "guice-3.0.jar" ]; then
       OTHER_JARS=$eachjarinlib,$OTHER_JARS
fi
done
echo ---final list of jars are : $OTHER_JARS
echo $CLASSPATH

spark-submit --verbose --class <yourclass>
... OTHER OPTIONS
--jars $OTHER_JARS,$latestguicejar,APPLICATIONJARTOBEADDEDSEPERATELY.JAR

also see holdens answer. check with your version of the spark what is available.

As per docs runtime-environment userClassPathFirst are present in the latest version of spark as of today.

spark.executor.userClassPathFirst
spark.driver.userClassPathFirst

for this to use you can make uber jar with all application level dependencies.