0
votes

I am running a spark job in standalone mode. I have configured my worker node to connect to master node. They are getting connected successfully, but when I am running the job on spark master the job is not getting distributed. I keep on getting the following message-

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I have tried to run the job as local on the worker node and its running fine which means resources are available. Also the spark master-ui is showing that the worker has accepted the job.Password less ssh is enabled in both master and worker node to and fro. I feel it might be some firewall issue or may be spark driver port is not opened. My worker node logs show-

16/03/21 10:05:40 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-7-oracle/bin/java" "-cp" "/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/sbin/../conf/:/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/spark-assembly-1.5.0-hadoop2.6.0.jar:/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/mnt/pd1/spark/spark-1.5.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" "-Xms8192M" "-Xmx8192M" "-Dspark.driver.port=51810" "-Dspark.cassandra.connection.port=9042" "-XX:MaxPermSize=256m" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:51810/user/CoarseGrainedScheduler" "--executor-id" "2" "--hostname" "10.0.1.194" "--cores" "4" "--app-id" "app-20160321100135-0001" "--worker-url" "akka.tcp://[email protected]:39423/user/Worker"

Executor at worker nodes shows the following log in stderr-

16/03/21 10:13:52 INFO Slf4jLogger: Slf4jLogger started 16/03/21 10:13:52 INFO Remoting: Starting remoting 16/03/21 10:13:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:59715] 16/03/21 10:13:52 INFO Utils: Successfully started service 'driverPropsFetcher' on port 59715.

2
you need a resource manager, if you run in standalone mode alone, the job won't be distributed.eliasah
Spark standalone mode is a cluster manager. I am running my job on another cluster with 3 worker and 1 master node and its working fine. I feel it might be some firewall issue. How can I figure out which spark driver port is being used.Y0gesh Gupta
Another way for this to happen, is asking for executor memory size bigger than the RAM on the machine.Randall Whitman

2 Answers

0
votes

You can specifiy a specific driver port within the Spark context:

spark.driver.port  = "port"
val conf = new SparkConf().set("spark.driver.port", "51810") 

PS: When manually starting the spark worker on the worker machine and connect it to the Master, you dont need any further passless authentication or similar between master and spark. This would only be necessarry if you use the Master for starting all slaves (start-slaves.sh). So this shouldnt be a problem.

0
votes

Many people have the issue when setting up new clusters. If you can find spark slaves in the web UI but they are not accepting jobs, there is a high chance that the firewall is blocking the communication. Take a look at my other answer: Apache Spark on Mesos: Initial job has not accepted any resources:

While most of other answers focuses on resource allocation (cores, memory) on spark slaves, I would like to highlight that firewall could cause exactly the same issue, especially when you are running spark on cloud platforms.

If you can find spark slaves in the web UI, you have probably opened the standard ports 8080, 8081, 7077, 4040. Nonetheless, when you actually run a job, it uses SPARK_WORKER_PORT, spark.driver.port and spark.blockManager.port which by default are randomly assigned. If your firewall is blocking these ports, the master could not retrieve any job-specific response from slaves and return the error.

You can run a quick test by opening all the ports and see whether the slave accepts jobs.