How to override spark jars while running spark-submit command in cluster mode? (okhttp3)

Question

There is a conflict of jar in my project and jar in spark-2.4.0 jars folder. My Retrofit brings okhttp-3.13.1.jar (verified in mvn dependancy:tree) but spark in server has okhttp-3.8.1.jar, and I get NoSuchMethodException. So, I'm trying to give my jar explicitly to override it. When I try running spark submit command in client mode, it picks up the explicit jar that I have provided but when I try running the same in cluster mode, this fails to override the jar at the worker nodes, and executors use the same old jar of Spark which leads to NoSuchMethodError. My jar is a fat jar but spark jar somehow takes precedence over the same. If i can delete the jars provided by Spark, it would work probably but I can't as other services may be using it.

Following is my command:

./spark-submit --class com.myJob --conf spark.yarn.appMasterEnv.ENV=uat --conf spark.driver.memory=12g  --conf spark.executor.memory=40g --conf spark.sql.warehouse.dir=/user/myuser/spark-warehouse --conf "spark.driver.extraClassPath=/home/test/okhttp-3.13.1.jar" --conf "spark.executor.extraClassPath=/home/test/okhttp-3.13.1.jar" --jars /home/test/okhttp-3.13.1.jar  --conf spark.submit.deployMode=cluster --conf spark.yarn.archive=hdfs://namenode/frameworks/spark/spark-2.4.0-archives/spark-2.4.0-archive.zip --conf spark.master=yarn --conf spark.executor.cores=4 --queue public file:///home/mytest/myjar-SNAPSHOT.jar


final Retrofit retrofit = new Retrofit.Builder()
                    .baseUrl(configuration.ApiUrl()) //this throws nosuchmethodexception
                    .addConverterFactory(JacksonConverterFactory.create(new ObjectMapper()))
                    .build();

My mvn dependency:tree doesnt indicate any other transitive jars in my jar. And it runs fine in local in IntelliJ as well as with mvn clean install.

I even tried providing hdfs path of jars(hdfs://users/myuser/myjars/okhttp-3.13.1.jar) with no luck. Can someone throw some light ?

I get the following exception if I try both --conf "spark.driver.userClassPathFirst=true" --conf "spark.executor.userClassPathFirst=true"

Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<init>(YarnSparkHadoopUtil.scala:48)
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<clinit>(YarnSparkHadoopUtil.scala)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply$mcJ$sp(Client.scala:81)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:80)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl cannot be cast to org.apache.hadoop.yarn.api.records.Priority
    at org.apache.hadoop.yarn.api.records.Priority.newInstance(Priority.java:39)
    at org.apache.hadoop.yarn.api.records.Priority.<clinit>(Priority.java:34)
    ... 15 more

But if I have only --conf "spark.executor.userClassPathFirst=true" then it hangs

Have you tried setting spark.driver.userClassPathFirst=true / spark.executor.userClassPathFirst=true? Just beware of potentially disastrous side-effects :) — mazaneicha
I tried the mentioned suggestions, and I get another exception. added to the description now.. please check — Saawan
Thats what disastrous side-effects look like, especially when you built a fat jar with dependency versions different from target environment. — mazaneicha
why does jar override work fine in case of client mode even without userClassPathFirst=true but it doesnt work in cluster mode ? — Saawan

Saawan Saawan · Accepted Answer · 2020-04-12T12:51:13

I have solved the issue using maven shade plugin.

Ignore Spark Cluster Own Jars

Referrence video :

https://youtu.be/WyfHUNnMutg?t=23m1s

I followed answer given here and added the following. Even in sourcecode SparkSubmit, you will see jar getting appended to total jar list if we give --jar so it will never override with those options but it will add jar

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L644

<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <relocations>
                                <relocation>
                                    <pattern>okio</pattern>
                                    <shadedPattern>com.shaded.okio</shadedPattern>
                                </relocation>
                                <relocation>
                                    <pattern>okhttp3</pattern>
                                    <shadedPattern>com.shaded.okhttp3</shadedPattern>
                                </relocation>
                            </relocations>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                        <exclude>log4j.properties</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

How to override spark jars while running spark-submit command in cluster mode? (okhttp3)

1 Answers