We are migrating our Spark workloads from Cloudera to Kubernetes.
For demo purposes, we wish to run one of our spark jobs within a minikube cluster using spark-submit in cluster mode.
I would like to pass a typesafe config file to my executors using the spark.file conf (I tried --files as well). The configuration file has been copied to the spark docker image at build time at the /opt/spark/conf directory.
Yet when I submit my job, I have a java.io.FileNotFoundException: File file:/opt/spark/conf/application.conf does not exist.
My understanding is that spark.files copies the files from driver to executors' working directory.
Am I missing something ? Thank you for your help.
Here is my spark-submit command
spark-submit \
--master k8s://https://192.168.49.2:8443 \
--driver-memory ${SPARK_DRIVER_MEMORY} --executor-memory ${SPARK_EXECUTOR_MEMORY} \
--deploy-mode cluster \
--class "${MAIN_CLASS}" \
--conf spark.driver.defaultJavaOptions="-Dconfig.file=local://${POD_CONFIG_DIR}/application.conf $JAVA_ARGS" \
--conf spark.files="file:///${POD_CONFIG_DIR}/application.conf,file:///${POD_CONFIG_DIR}/tlereg.properties" \
--conf spark.executor.defaultJavaOptions="-Dconfig.file=local://./application.conf" \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=$SPARK_CONTAINER_IMAGE \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kryoserializer.buffer.max=512M \
--conf spark.driver.maxResultSize=8192M \
--conf spark.kubernetes.authenticate.caCertFile=$HOME/.minikube/ca.crt \
--conf spark.executor.extraClassPath="./" \
local:///path/to/uber/jar.jar \
"${PROG_ARGS[@]}" > $LOG_FILE 2>&1
docker run -it --rm spark:2.4.5 bash
and I checked that the application.conf has been copied to /opt/spark/conf – cnemri