0
votes

I've been using Spark for a couple of weeks now in a cluster set up on Digital Ocean, with one master and one slave, but I keep having the same error "Initial job has not accepted any ressources; check your cluster UI to ensure that workers are registered and have sufficient resources". I have to ask because no answer here or on the internet has solved this.

So I'm trying on my computer as well as on master this command:

./bin/pyspark --master spark://<MASTER-IP>:7077

and the shell launches correctly but if I test it with this example:

sc.parallelize(range(10)).count()

I get the error.

I'm sure it's not a problem of ressources because I can launch the shell from both nodes and create rdds without a problem, with memory and core variables set in spark-env.sh and master and slave can communicate through ssh to one another. I've read that it could be the slave not able to communicate back to the driver, which in my case would either be my computer or master.

3
What does the cluster UI say? Also, can the slave talk to the machine where the python console is launched? - Reactormonk
I can see the worker and the PySparkShell running on the UI. The thing is that when I run the console on my pc, spark.driver.host displays a private ip. - eftov

3 Answers

0
votes

The spark slave nodes and master must be able to communicate back with the driver, i.e. you need to open ports on your pc to the cluster (preferably only to allow the specific node ips in the cluster).

I believe that if this is the case you can see a connection error in the stderr logs of the worker nodes.

Refere to network security guide for more detail on configuring ports.

0
votes

I got it working by setting up a private network and editing the SPARK_LOCAL_IP in spark-env.sh. So now if I run the spark shell on the master by ssh'ing into it, it works; although not from outside that network even by opening ports. So I wonder if it is really possible to run a shell remotely ?

0
votes

Spark on Yarn can run on two modes

  1. cluster mode - the spark driver is run in the spark master node
  2. client mode - the spark driver is run from the client side where the interactive shell is run.

Cluster mode is not well suited to using Spark interactively as in pyspark. Spark applications that require user input, such as spark-shell and pyspark, require the Spark driver to run inside the client process that initiates the Spark application.

Client mode can be set in environment as below
export PYSPARK_SUBMIT_ARGS='--master yarn --deploy-mode client pyspark-shell'