Spark Python submission error : File does not exist: pyspark.zip

Question

I'm trying to submit python spark application on yarn-cluster mode.

Seq(System.getenv("SPARK_HOME")+"/bin/spark-submit","--master",sparkConfig.getString("spark.master"),"--executor-memory",sparkConfig.getString("spark.executor-memory"),"--num-executors",sparkConfig.getString("spark.num-executors"),"python/app.py") !

I'm getting following error ,

Diagnostics: File does not exist: hdfs://xxxxxx:8020/user/hdfs/.sparkStaging/application_123456789_0138/pyspark.zip java.io.FileNotFoundException: File does not exist: hdfs://xxxxxx:8020/user/hdfs/.sparkStaging/application_123456789_0138/pyspark.zip

I found https://issues.apache.org/jira/browse/SPARK-10795

But the ticket is still open !

Neeraj Jain Neeraj Jain · Accepted Answer · 2016-09-15T18:22:57

This happens when you are trying to spark-submit a job with deploy-mode "cluster" and you are trying to set master as "local"; e.g.

val sparkConf = new SparkConf().setAppName("spark-pi-app").setMaster("local[10]");

You have two options: Option #1: Change the above line to:

val sparkConf = new SparkConf().setAppName("spark-pi-app");

and submit your job as

./bin/spark-submit --master yarn --deploy-mode cluster --driver-memory 512m --executor-memory 512m --executor-cores 1 --num-executors 3 --jars hadoop-common-{version}.jar,hadoop-lzo-{version}.jar --verbose --queue hadoop-queue --class "SparkPi" sparksbtproject_2.11-1.0.jar

Option #2: Submit your job with deploy-mode as "client"

./bin/spark-submit --master yarn --deploy-mode client --driver-memory 512m --executor-memory 512m --executor-cores 1 --num-executors 3 --jars hadoop-common-{version}.jar,hadoop-lzo-{version}.jar --verbose --queue hadoop-queue --class "SparkPi" sparksbtproject_2.11-1.0.jar

Spark Python submission error : File does not exist: pyspark.zip

5 Answers