Share config files with spark-submit in cluster mode

Question

I've been running my spark jobs in "client" mode during development. I use "--file" to share config files with executors. Driver was reading config files locally. Now I want to deploy the job in "cluster" mode. I'm having difficulty sharing the config files with driver now.

Ex, I'm passing the config file name as extraJavaOptions to both driver and executors. I'm reading the file using SparkFiles.get()

  val configFile = org.apache.spark.SparkFiles.get(System.getProperty("config.file.name"))

This works well on the executors but fails on driver. I think the files are only shared with executors and not with the container where driver is running. One option is to keep the config files in S3. I wanted to check if this can be achieved using spark-submit.

> spark-submit --deploy-mode cluster --master yarn --driver-cores 2
> --driver-memory 4g --num-executors 4 --executor-cores 4 --executor-memory 10g \
> --files /home/hadoop/Streaming.conf,/home/hadoop/log4j.properties \
> --conf **spark.driver.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --conf **spark.executor.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --class ....

Were you able to find the solution, I am also trying to find the solution for the similar problem. Please let me know how did you handle this scenario..Thanks — Aditya Agarwal
I faced something similar, posted the answer here. I hope it would help someone. stackoverflow.com/a/62095856/1929092 — Jugal Panchal

Peter Pan Peter Pan · Accepted Answer · 2018-04-16T21:48:24

I found a solution for this problem in this thread.

You can give an alias for the file you submitted through --files by adding '#alias' at the end. By this trick, you should be able to access the files through their alias.

For example, the following code can run without an error.

spark-submit --master yarn-cluster --files test.conf#testFile.conf test.py

with test.py as:

path_f = 'testFile.conf'
try:
    f = open(path_f, 'r')
except:
    raise Exception('File not opened', 'EEEEEEE!')

and an empty test.conf

Share config files with spark-submit in cluster mode

2 Answers