9
votes

I've been running my spark jobs in "client" mode during development. I use "--file" to share config files with executors. Driver was reading config files locally. Now I want to deploy the job in "cluster" mode. I'm having difficulty sharing the config files with driver now.

Ex, I'm passing the config file name as extraJavaOptions to both driver and executors. I'm reading the file using SparkFiles.get()

  val configFile = org.apache.spark.SparkFiles.get(System.getProperty("config.file.name"))

This works well on the executors but fails on driver. I think the files are only shared with executors and not with the container where driver is running. One option is to keep the config files in S3. I wanted to check if this can be achieved using spark-submit.

> spark-submit --deploy-mode cluster --master yarn --driver-cores 2
> --driver-memory 4g --num-executors 4 --executor-cores 4 --executor-memory 10g \
> --files /home/hadoop/Streaming.conf,/home/hadoop/log4j.properties \
> --conf **spark.driver.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --conf **spark.executor.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --class ....
2
Were you able to find the solution, I am also trying to find the solution for the similar problem. Please let me know how did you handle this scenario..ThanksAditya Agarwal
I faced something similar, posted the answer here. I hope it would help someone. stackoverflow.com/a/62095856/1929092Jugal Panchal

2 Answers

4
votes

I found a solution for this problem in this thread.

You can give an alias for the file you submitted through --files by adding '#alias' at the end. By this trick, you should be able to access the files through their alias.

For example, the following code can run without an error.

spark-submit --master yarn-cluster --files test.conf#testFile.conf test.py

with test.py as:

path_f = 'testFile.conf'
try:
    f = open(path_f, 'r')
except:
    raise Exception('File not opened', 'EEEEEEE!')

and an empty test.conf

3
votes

You need to try the --properties-file option in Spark submit command.

For example properties file content

spark.key1=value1
spark.key2=value2

All the keys needs to be prefixed with spark.

then use the spark-submit command like this to pass the properties file.

bin/spark-submit --properties-file  propertiesfile.properties

Then in the code you can get the keys using below sparkcontext getConf method.

sc.getConf.get("spark.key1")  // returns value1

Once you get the key values, you can pass use it everywhere.