Our issue: (we’re running yarn-client)
This happens for both ./spark-shell
scala> sc.parallelize(1 to 1000).collect()and ./pyspark
>>> sc.parallelize([1,2,3,4]).collect()The shell output from the basic jobs above outputs this error
WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
- The logs output this line in continuous succession: (the time_stamp and port_number change, but each job outputs this error
ERROR ExecutorLauncher: Failed to connect to driver at LOCALHOST:< port_number>, retrying ...
- We have SPARK_LOCAL_IP=LOCALHOST set in our spark-env.sh (everything else fails to even instantiate a sc in the REPL, examples: ip_address, HOSTNAME, and commenting it out entirely)
Our Setup:
- Using almost all default YARN settings in the spark-defaults.conf and spark-env.sh
- This is robust hardware:
- 128 GB per node,
- 16 cores,
- 2 TB physical memory
- The Spark master is on its own node, as are the ResourceManager, and the NodeManager (with about 5 data nodes)
Errors: (in REPL)
WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
(in YARN logs)
15/09/12 13:03:00 INFO ExecutorLauncher: Waiting for Spark driver to be reachable. 15/09/12 13:03:00 ERROR ExecutorLauncher: Failed to connect to driver at LOCALHOST:45790, retrying ...