0
votes

Our issue: (we’re running yarn-client)

  • This happens for both ./spark-shell

    scala> sc.parallelize(1 to 1000).collect()
    

    and ./pyspark

    >>> sc.parallelize([1,2,3,4]).collect()
    
  • The shell output from the basic jobs above outputs this error

    WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

  • The logs output this line in continuous succession: (the time_stamp and port_number change, but each job outputs this error

    ERROR ExecutorLauncher: Failed to connect to driver at LOCALHOST:< port_number>, retrying ...

  • We have SPARK_LOCAL_IP=LOCALHOST set in our spark-env.sh (everything else fails to even instantiate a sc in the REPL, examples: ip_address, HOSTNAME, and commenting it out entirely)

Our Setup:

  • Using almost all default YARN settings in the spark-defaults.conf and spark-env.sh
  • This is robust hardware:
    • 128 GB per node,
    • 16 cores,
    • 2 TB physical memory
  • The Spark master is on its own node, as are the ResourceManager, and the NodeManager (with about 5 data nodes)

Errors: (in REPL)

WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

(in YARN logs)

15/09/12 13:03:00 INFO ExecutorLauncher: Waiting for Spark driver to be reachable. 15/09/12 13:03:00 ERROR ExecutorLauncher: Failed to connect to driver at LOCALHOST:45790, retrying ...

2
Can you share the spark-submit command being used to launch this? - Adi Kish

2 Answers

0
votes

The last error log is pretty neat: executors are trying to contact the driver on localhost:45790, but it is not listening there.

What you want to do is update the spark.driver.host and spark.driver.port in the /path/to/spark/conf/spark-default.conf file, to something your executor will be able to contact.

0
votes

You should actually set SPARK_LOCAL_IP to your local host IP (the one other than the loopback IP)

So if your ip is xxx.xxx.xxx.xxx

export SPARK_LOCAL_IP=xxx.xxx.xxx.xxx

And then make sure that the driver is actually running