YARN applications cannot start when specifying YARN node labels

Question

I'm trying to use YARN node labels to tag worker nodes, but when I run applications on YARN (Spark or simple YARN app), those applications cannot start.

with Spark, when specifying --conf spark.yarn.am.nodeLabelExpression="my-label", the job cannot start (blocked on Submitted application [...], see details below).
with a YARN application (like distributedshell), when specifying -node_label_expression my-label, the application cannot start neither

Here are the tests I have made so far.

YARN node labels setup

I'm using Google Dataproc to run my cluster (example : 4 workers, 2 on preemptible nodes). My goal is to force any YARN application master to run on a non-preemptible node, otherwise the node can be shutdown at any time, thus making the application fail hard.

I'm creating the cluster using YARN properties (--properties) to enable node labels :

gcloud dataproc clusters create \
    my-dataproc-cluster \
    --project [PROJECT_ID] \
    --zone [ZONE] \
    --master-machine-type n1-standard-1 \
    --master-boot-disk-size 10 \
    --num-workers 2 \
    --worker-machine-type n1-standard-1 \
    --worker-boot-disk-size 10 \
    --num-preemptible-workers 2 \
    --properties 'yarn:yarn.node-labels.enabled=true,yarn:yarn.node-labels.fs-store.root-dir=/system/yarn/node-labels'

Versions of packaged Hadoop and Spark :

Hadoop version : 2.8.2
Spark version : 2.2.0

After that, I create a label (my-label), and update the two non-preemptible workers with this label :

yarn rmadmin -addToClusterNodeLabels "my-label(exclusive=false)"
yarn rmadmin -replaceLabelsOnNode "\
    [WORKER_0_NAME].c.[PROJECT_ID].internal=my-label \
    [WORKER_1_NAME].c.[PROJECT_ID].internal=my-label"

I can see the created label in YARN Web UI :

Spark

When I run a simple example (SparkPi) without specifying info about node labels :

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \
  /usr/lib/spark/examples/jars/spark-examples.jar \
  10

In the Scheduler tab on YARN Web UI, I see the application launched on <DEFAULT_PARTITION>.root.default.

But when I run the job specifying spark.yarn.am.nodeLabelExpression to set the location of the Spark application master :

spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --conf spark.yarn.am.nodeLabelExpression="my-label" \
    /usr/lib/spark/examples/jars/spark-examples.jar \
    10

The job is not launched. From YARN Web UI, I see :

YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Diagnostics: Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue's Absolute capacity = 0.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 0.0 % ;

I suspect that the queue related to the label partition (not <DEFAULT_PARTITION, the other one) does not have sufficient resources to run the job :

Here, Used Application Master Resources is <memory:1024, vCores:1>, but the Max Application Master Resources is <memory:0, vCores:0>. That explains why the application cannot start, but I can't figure out how to change this.

I tried to update different parameters, but without success :

yarn.scheduler.capacity.root.default.accessible-node-labels=my-label

Or increasing those properties :

yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-am-resource-percent
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.user-limit-factor
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.minimum-user-limit-percent

without success neither.

YARN Application

The issue is the same when running a YARN application :

hadoop jar \
    /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
    -shell_command "echo ok" \
    -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
    -queue default \
    -node_label_expression my-label

The application cannot start, and the logs keeps repeating :

INFO distributedshell.Client: Got application report from ASM for, appId=6, clientToAMToken=null, appDiagnostics= Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue's Absolute capacity = 0.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 0.0 % ; , appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1520354045946, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, [...]

If I don't specify -node_label_expression my-label, the application start on <DEFAULT_PARTITION>.root.default and succeed.

Questions

Am I doing something wrong with the labels? However, I followed the official documentation and this guide
Is this a specific problem related to Dataproc? Because the previous guides seems to work on other environments
Maybe I need to create a specific queue and associate it with my label? But since I'm running a "one-shot" cluster to run a single Spark job I don't need to have specific queues, running jobs on the default root one is not a problem for my use-case

Thanks for helping

Hi! GCP Support here. After reproducing your issue, I think it might be worth to report it in Public Issue Tracker so that it can be better tracked there. That way, you will be able to provide additional information that may be required to troubleshoot the issue. With the information we have right now, we have not been able to identify the root cause for the issue you are facing here, so maybe there's a better chance to track it in PIT. If you do so, feel free to post that as an answer, so that the community is aware of that. — dsesto
Hello, we just created an issue as you recommended. So, as I understand, the problem we got is related to Dataproc, not YARN right? — norbjd
Thanks for doing so. At the moment we do not know where the issue comes from, but I hope we can have more information when going on with the investigation. Feel free to post the link to the PIT in order for the community to track its resolution too. — dsesto

norbjd norbjd · Accepted Answer · 2018-04-07T11:42:52

A Google engineer answered us (on a private issue we raised, not in the PIT), and gave us a solution by specifying an initialization script to Dataproc cluster creation. I don't think the issue comes from Dataproc, this is basically just YARN configuration. The script sets the following properties in capacity-scheduler.xml, just after creating the node label (my-label) :

<property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels</name>
  <value>my-label</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels.my-label.capacity</name>
  <value>100</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
  <value>my-label</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity</name>
  <value>100</value>
</property>

From the comment going along with the script, this "set accessible-node-labels on both root (the root queue) and root.default (the default queue applications actually get run on)". The root.default part is what was missing in my tests. Capacity for both is set to 100.

Then, restarting YARN (systemctl restart hadoop-yarn-resourcemanager.service) is needed to validate the modifications.

After that, I was able to start jobs that failed to complete in my question.

Hope that will help people having the same issues or similar.