Spark on YARN failure to launch NodeManager and ResourceManager

Question

Our issue: (we’re running yarn-client)

This happens for both ./spark-shell

scala> sc.parallelize(1 to 1000).collect()

and ./pyspark

>>> sc.parallelize([1,2,3,4]).collect()

The shell output from the basic jobs above outputs this error

WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
The logs output this line in continuous succession: (the time_stamp and port_number change, but each job outputs this error

ERROR ExecutorLauncher: Failed to connect to driver at LOCALHOST:< port_number>, retrying ...
We have SPARK_LOCAL_IP=LOCALHOST set in our spark-env.sh (everything else fails to even instantiate a sc in the REPL, examples: ip_address, HOSTNAME, and commenting it out entirely)

Our Setup:

Using almost all default YARN settings in the spark-defaults.conf and spark-env.sh
This is robust hardware:
- 128 GB per node,
- 16 cores,
- 2 TB physical memory
The Spark master is on its own node, as are the ResourceManager, and the NodeManager (with about 5 data nodes)

Errors: (in REPL)

WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

(in YARN logs)

15/09/12 13:03:00 INFO ExecutorLauncher: Waiting for Spark driver to be reachable. 15/09/12 13:03:00 ERROR ExecutorLauncher: Failed to connect to driver at LOCALHOST:45790, retrying ...

Can you share the spark-submit command being used to launch this? — Adi Kish

Bacon Bacon · Accepted Answer · 2016-04-20T23:16:42

The last error log is pretty neat: executors are trying to contact the driver on localhost:45790, but it is not listening there.

What you want to do is update the spark.driver.host and spark.driver.port in the /path/to/spark/conf/spark-default.conf file, to something your executor will be able to contact.

Spark on YARN failure to launch NodeManager and ResourceManager

2 Answers