3
votes

We are migrating our Spark workloads from Cloudera to Kubernetes.

For demo purposes, we wish to run one of our spark jobs within a minikube cluster using spark-submit in cluster mode.

I would like to pass a typesafe config file to my executors using the spark.file conf (I tried --files as well). The configuration file has been copied to the spark docker image at build time at the /opt/spark/conf directory.

Yet when I submit my job, I have a java.io.FileNotFoundException: File file:/opt/spark/conf/application.conf does not exist.

My understanding is that spark.files copies the files from driver to executors' working directory.

Am I missing something ? Thank you for your help.

Here is my spark-submit command

spark-submit \
        --master k8s://https://192.168.49.2:8443 \
        --driver-memory ${SPARK_DRIVER_MEMORY} --executor-memory ${SPARK_EXECUTOR_MEMORY} \
        --deploy-mode cluster \
        --class "${MAIN_CLASS}" \
        --conf spark.driver.defaultJavaOptions="-Dconfig.file=local://${POD_CONFIG_DIR}/application.conf $JAVA_ARGS" \
        --conf spark.files="file:///${POD_CONFIG_DIR}/application.conf,file:///${POD_CONFIG_DIR}/tlereg.properties" \
        --conf spark.executor.defaultJavaOptions="-Dconfig.file=local://./application.conf" \
        --conf spark.executor.instances=5 \
        --conf spark.kubernetes.container.image=$SPARK_CONTAINER_IMAGE \
        --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
        --conf spark.kryoserializer.buffer.max=512M \
        --conf spark.driver.maxResultSize=8192M \
        --conf spark.kubernetes.authenticate.caCertFile=$HOME/.minikube/ca.crt \
        --conf spark.executor.extraClassPath="./" \
        local:///path/to/uber/jar.jar \
        "${PROG_ARGS[@]}" > $LOG_FILE 2>&1
1
I suggest you ensure that the files were copied into /opt/spark/conf/ during docker build image by adding a line that prints the contents of the directory.SMA
I tried docker run -it --rm spark:2.4.5 bash and I checked that the application.conf has been copied to /opt/spark/confcnemri
Try: --conf spark.files="${POD_CONFIG_DIR}/application.conf,${POD_CONFIG_DIR}/tlereg.properties" \ and Try: --conf spark.files="file://${POD_CONFIG_DIR}/application.conf,file://${POD_CONFIG_DIR}/tlereg.properties" \SMA
I tried both. They yield : ERROR SparkContext: Error initializing SparkContext. java.io.FileNotFoundException: File file:/opt/spark/conf/application.conf does not exist. Am I right when I say that those files are copied from driver pod (build using spark:2.4.5 image, which I am sure it contains the config files) ? Otherwise , where does it look for those files ? Is spark.files behaviour different according to wether we are in client or cluster mode ? Thank you in advance for your help.cnemri

1 Answers

2
votes

I've figured it out. spark-submit sends a request to kubernetes master's api-server to create a driver pod. A configmap volume is mounted to the driver's pod at mountPath: /opt/spark/conf, which overrides my config files located at that path in the docker container. Workaround : editing /opt/spark/conf to /opt/spark/config in Dockerfile so that my configuration files are copied from the latter.