I've been running my spark jobs in "client" mode during development. I use "--file" to share config files with executors. Driver was reading config files locally. Now I want to deploy the job in "cluster" mode. I'm having difficulty sharing the config files with driver now.
Ex, I'm passing the config file name as extraJavaOptions to both driver and executors. I'm reading the file using SparkFiles.get()
val configFile = org.apache.spark.SparkFiles.get(System.getProperty("config.file.name"))
This works well on the executors but fails on driver. I think the files are only shared with executors and not with the container where driver is running. One option is to keep the config files in S3. I wanted to check if this can be achieved using spark-submit.
> spark-submit --deploy-mode cluster --master yarn --driver-cores 2
> --driver-memory 4g --num-executors 4 --executor-cores 4 --executor-memory 10g \
> --files /home/hadoop/Streaming.conf,/home/hadoop/log4j.properties \
> --conf **spark.driver.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --conf **spark.executor.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --class ....