2
votes

I'm trying to submit a spark job from a different server outside of my Spark Cluster (running spark 1.4.0, hadoop 2.4.0 and YARN) using the spark-submit script :

spark/bin/spark-submit --master yarn-client --executor-memory 4G myjobScript.py

The think is that my application never pass from the accepted state, it stuck on it :

15/07/08 16:49:40 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)
15/07/08 16:49:41 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)
15/07/08 16:49:42 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)
15/07/08 16:49:43 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)
15/07/08 16:49:44 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)
15/07/08 16:49:45 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)
15/07/08 16:49:46 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)
15/07/08 16:49:47 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)
15/07/08 16:49:48 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)
15/07/08 16:49:49 INFO Client: Application report for application_1436314873375_0030 (state: ACCEPTED)

But if i execute the same script with spark-submit in the master server of my cluster it runs correctly.

I already set the yarn configuration in the remote server in $YARN_CONF_DIR/yarn-site.xml like this :

 <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>54.54.54.54</value>
 </property>

 <property>
   <name>yarn.resourcemanager.address</name>
   <value>54.54.54.54:8032</value>
   <description>Enter your ResourceManager hostname.</description>
 </property>

 <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>54.54.54.54:8030</value>
   <description>Enter your ResourceManager hostname.</description>
 </property>

 <property>
   <name>yarn.resourcemanager.resourcetracker.address</name>
   <value>54.54.54.54:8031</value>
   <description>Enter your ResourceManager hostname.</description>
 </property>

Where 54.54.54.54 is the IP of my resourcemanager node.

Why is this happening? do i have to configure something else in YARN to accept remote submits? or what am i missing?

Thanks a lot

JG

2

2 Answers

2
votes

I suspect it's a problem that your application master on YARN is unable to reach out to your localhost. Have you check to see if there is any logs attached to your accepted application? You may need to set SPARK_LOCAL_IP environment variable to an cluster addressable IP address so YARN can reach back to you.

Have you tried running yarn-cluster mode instead so your driver program will actually run on your YARN cluster? That could be a better option if you localhost is far away from your cluster to avoid the communication latency.

1
votes

I can think of two things:

  1. The spark-submit should look for HADOOP_CONF_DIR or YARN_CONF_DIR env variable for location of yarn-site.xml in the local box, not remote boxes.

  2. If above is done, and still get the same issue, you may need to look into network firewall setup; spark will communicate with YARN RM (etc) via multiple ports, and Spark internally communicate through AKKA cluster, and their ports are pretty random. Better try to open all tcp ports first and see whether works.

Hope this could help.