3
votes

I am running Apache Spark on cluster mode using Apache Mesos. But, when I start Spark-Shell to run a simple test command (sc.parallelize(0 to 10, 8).count) I receive the following warning message:

16/03/10 11:50:55 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

If I check on Mesos WebUI I can see that Spark-Shell is listed as a framework and I have listed one slave (my own machine). Any help how to troubleshoot it?

1
Have you installed Spark on slaves? You should have worker nodes installed with Spark.avr
Try to look for further issues in the mesos-slave logs.hbogert

1 Answers

3
votes

While most of other answers focuses on resource allocation (cores, memory) on spark slaves, I would like to highlight that firewall could cause exactly the same issue, especially when you are running spark on cloud platforms.

If you can find spark slaves in the web UI, you have probably opened the standard ports 8080, 8081, 7077, 4040. Nonetheless, when you actually run a job, it uses SPARK_WORKER_PORT, spark.driver.port and spark.blockManager.port which by default are randomly assigned. If your firewall is blocking these ports, the master could not retrieve any job-specific response from slaves and return the error.

You can run a quick test by opening all the ports and see whether the slave accepts jobs.