1
votes

I am using Spark 2.1 (BTW) on a YARN cluster.

I am trying to upload JAR on YARN cluster, and to use them to replace on-site (alreading in-place) Spark JAR.

I am trying to do so through spark-submit.

The question Add jars to a Spark Job - spark-submit - and the related answers - are full of interesting points.

One helpful answer is the following one:

spark-submit --jars additional1.jar,additional2.jar \
  --driver-class-path additional1.jar:additional2.jar \
  --conf spark.executor.extraClassPath=additional1.jar:additional2.jar \
  --class MyClass main-application.jar

So, I understand the following:

  • "--jars" is for uploading jar on each node
  • "--driver-class-path" is for using uploaded jar for the driver.
  • "--conf spark.executor.extraClassPath" is for using uploaded jar for executors.

While I master the filepaths for "--jars" within a spark-submit command, what will be the filepaths of the uploaded JAR to be used in "--driver-class-path" for example ?

The doc says: "JARs and files are copied to the working directory for each SparkContext on the executor nodes"

Fine, but for the following command, what should I put instead of XXX and YYY ?

spark-submit --jars /a/b/some1.jar,/a/b/c/some2.jar \
  --driver-class-path XXX:YYY \
  --conf spark.executor.extraClassPath=XXX:YYY \
  --class MyClass main-application.jar

When using spark-submit, how can I reference the "working directory for the SparkContext" to form XXX and YYY filepath ?

Thanks.

PS: I have tried

spark-submit --jars /a/b/some1.jar,/a/b/c/some2.jar \
  --driver-class-path some1.jar:some2.jar \
  --conf spark.executor.extraClassPath=some1.jar:some2.jar  \
  --class MyClass main-application.jar

No success (if I made no mistake)

And I have tried also:

spark-submit --jars /a/b/some1.jar,/a/b/c/some2.jar \
  --driver-class-path ./some1.jar:./some2.jar \
  --conf spark.executor.extraClassPath=./some1.jar:./some2.jar  \
  --class MyClass main-application.jar

No success either.

1

1 Answers

0
votes

spark-submit by default uses client mode.

In client mode, you should not use --jars in conjunction with --driver-class-path.

--driver-class-path will overwrite original classpath, instead of prepending to it as one may expect.

--jars will automatically add the extra jars to the driver and executor classpath so you do not need to add its path manually.

It seems that in cluster mode --driver-class-path is ignored.