0
votes

I have a cluster with two workers and one master. To start master & workers I use the sbin/start-master.sh and sbin/start-slaves.shin the master's machine. Then, the master UI shows me that the slaves are ALIVE (so, everything OK so far). Issue comes when I want to use spark-submit.

I execute this command in my local machine:

spark-submit --master spark://<master-ip>:7077 --deploy-mode cluster /home/user/example.jar

But the following error pops up: ERROR ClientEndpoint: Exception from cluster was: java.nio.file.NoSuchFileException: /home/user/example.jar

I have been doing some research in stack overflow and Spark's documentation and it seems like I should specify the application-jar of spark-submit command as "Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes." (as it indicates https://spark.apache.org/docs/latest/submitting-applications.html).

My question is: how can I set my .jar as globally visible inside the cluster? There is a similar question in here Spark Standalone cluster cannot read the files in local filesystem but solutions do not work for me.

Also, am I doing something wrong by initialising the cluster inside my master's machine using sbin/start-master.sh but then doing the spark-submit in my local machine? I initialise the master inside my master's terminal because I read so in Spark's documentation, but maybe this has something to do with the issue. From Spark's documentation:

Once you’ve set up this file, you can launch or stop your cluster with the following shell scripts, based on Hadoop’s deploy scripts, and available in SPARK_HOME/sbin: [...] Note that these scripts must be executed on the machine you want to run the Spark master on, not your local machine.

Thank you very much

EDIT: I have copied the file .jar in every worker and it works. But my point is to know if there is a better way, since this method makes me copy the .jar to each worker everytime I create a new jar. (This was one of the answers from the question of the already posted link Spark Standalone cluster cannot read the files in local filesystem )

1
Have you tried to indicate where to find the jar file with --jars example.jar when running spark-submit?Oli
Hi Oli, thanks for answering! How would you do it? If I use --jars example.jar after the whole command I wrote above it still gives me the same error (NoSuchFileException). Whereas if I do not give the above path and I write instead --jars example.jar or --jars /home/user/example.jar it gives me the error: Missing application resource.meisan
please try to give the --class option like following spark-submit --master spark://<master-ip>:7077 --deploy-mode cluster --jars /home/user/example.jar --class <your-main-class-name>sarath kumar
Hi Sarath! Thanks for your answer. I tried it and spark-submit gives me the error Missing application resource. (and offers me the options available with spark-submit)meisan

1 Answers

2
votes

@meisan your spark-submit command is missing out on 2 things.

  • your jars should be added with flag --jar
  • file holding your driver code i.e. the main function.

Now you have not specified anywhere if you are using scala or python but in the nutshell your command will look something like:

for python :

spark-submit --master spark://<master>:7077 --deploy-mode cluster --jar <dependency-jars> <python-file-holding-driver-logic>

for scala:

spark-submit --master spark://<master>:7077 --deploy-mode cluster --class <scala-driver-class> --driver-class-path <application-jar> --jar <dependency-jars>

Also, spark takes care of sending the required files and jars to the executors when you use the documented flags. If you want to omit the --driver-class-path flag, you can set the environmental variable SPARK_CLASSPATH to path where all your jars are placed.