3
votes

My question is similar to other posters reporting on "Initial job has not accepted any resources". I read their suggestions and still not able to submit the job from Java. I am wondering if somebody with more experience installing Spark sees an obvious miss or knows how to troubleshoot this?

Spark : check your cluster UI to ensure that workers are registered.

My configuration is as follows: (VM Fedora) MASTER: version 2.0.2, prebuilt w/ hadoop. WORKER: single instance.

(Host/Windows Java app) Client is a sample JavaApp, configured with

conf.set("spark.cores.max","1");
conf.set("spark.shuffle.service.enabled", "false");
conf.set("spark.dynamicAllocation.enabled", "false");

Attached is a snapshot of Spark UI. As far as I can tell my job is received, submitted and running. It also appears that I am not over-utilizing CPU and RAM.

enter image description here

Java(client) console reports

12:15:47.816 DEBUG parentName: , name: TaskSet_0, runningTasks: 0
12:15:48.815 DEBUG parentName: , name: TaskSet_0, runningTasks: 0
12:15:49.806 WARN Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
12:15:49.816 DEBUG parentName: , name: TaskSet_0, runningTasks: 0
12:15:50.816 DEBUG parentName: , name: TaskSet_0, runningTasks: 0

Spark worker log reports.

16/11/22 12:16:34 INFO Worker: Asked to launch executor app-20161122121634-0012/0 for Simple 
Application
16/11/22 12:16:34 INFO SecurityManager: Changing modify acls groups to: 
16/11/22 12:16:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls dis
abled; users  with view permissions: Set(john); groups with view permissions: Set(); users 
 with modify permissions: Set(john); groups with modify permissions: Set()
16/11/22 12:16:34 INFO ExecutorRunner: Launch command: "/apps/jdk1.8.0_101/jre/bin/java" "-cp " "/apps/spark-2.0.2-bin-hadoop2.7/conf/:/apps/spark-2.0.2-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.driver.port=29015" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://[email protected]:29015" "--executor-id" "0" "--hostname" "192.168.56.103" "--cores" "1" "--app-id" "app-20161122121634-0012" "--worker-url" "spark://[email protected]:38701"

enter image description here

1
Try to kill the running Application and see what happens ! And let us know tooShiv4nsh
I tried stopping client many times. UI displays application in the completed section. That's wrong because the job did not actually execute. You could see in the attached image, "Simple Application" - >"Finished" The worker log shows 16/11/22 12:17:12 INFO Worker: Asked to kill executor app-20161122121634-0012/0 16/11/22 12:17:12 INFO ExecutorRunner: Runner thread for executor app-20161122121634-0012/0 interrupted 16/11/22 12:17:12 INFO ExecutorRunner: Killing process! 16/11/22 12:17:13 INFO Worker: Executor app-20161122121634-0012/0 finished with state KILLED , exitStatus 143Vortex
First Try to submit the application , then see if it still say Initial Job has not accepted any resources then go to the Spark UI to see how many appplications are submitted.My hunch is , one application would be in waiting and the one would be executing , and consuming all resource ! Then try to kill the running application and see what happens ?Shiv4nsh
I checked. There was only one job running. I added 2nd image to my post. I can't think of anything else to look into. Any thoughts? FYI. Running spark-submit computed Pi fine. spark-submit --verbose --class org.apache.spark.examples.SparkPi --master spark://192.168.56.103:7077 ../examples/jars/spark-examples_2.11-2.0.2.jarVortex
This must not happen , If SparkPi is running fine then , in that case the server configuration is correct ! What is your spark Conf look like and Are you trying to submit multiple applications at once from the code ?Shiv4nsh

1 Answers

0
votes

Do you have any firewall blocking communications? As stated in my other answer:

Apache Spark on Mesos: Initial job has not accepted any resources:

While most of other answers focuses on resource allocation (cores, memory) on spark slaves, I would like to highlight that firewall could cause exactly the same issue, especially when you are running spark on cloud platforms.

If you can find spark slaves in the web UI, you have probably opened the standard ports 8080, 8081, 7077, 4040. Nonetheless, when you actually run a job, it uses SPARK_WORKER_PORT, spark.driver.port and spark.blockManager.port which by default are randomly assigned. If your firewall is blocking these ports, the master could not retrieve any job-specific response from slaves and return the error.

You can run a quick test by opening all the ports and see whether the slave accepts jobs.