9
votes

What is the precedence in class loading when both the uber jar of my spark application and the contents of --jars option to my spark-submit shell command contain similar dependencies ?

I ask this from a third-party library integration standpoint. If I set --jars to use a third-party library at version 2.0 and the uber jar coming into this spark-submit script was assembled using version 2.1, which class is loaded at runtime ?

At present, I think of keeping my dependencies on hdfs, and add them to the --jars option on spark-submit, while hoping via some end-user documentation to ask users to set the scope of this third-party library to be 'provided' in their spark application's maven pom file.

1

1 Answers

16
votes

This is somewhat controlled with params:

  • spark.driver.userClassPathFirst &
  • spark.executor.userClassPathFirst

If set to true (default is false), from docs:

(Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only.

I wrote some of the code that controls this, and there were a few bugs in the early releases, but if you're using a recent Spark release it should work (although it is still an experimental feature).