DataProc Cluster Spark Job submission fails to start NodeManager

Question

We have Dataproc cluster with 4 workers configured. Cluster is up and running and whenever we try to submit the spark-job we are getting this error:

YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager

Some of the messages seen in Stackdriver logs are

Daemon YARN_NODE_MANAGER failed to restart

Update: This issue is noticed even while we add new worked node to the existing Dataproc cluster.

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from <MasterNode DNS> , Sending SHUTDOWN signal to the NodeManager.
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:374)
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:252)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:845)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:912)

Have you made changes to the cluster or YARN configuration? Have you looked at the logs on Stackdriver? — David Rabinowitz
Checked the logs, not sure, recreated the cluster and i dont see this error. — Usman Azhar
Similar issue is seen, if we add new worker node to the existing Cluster, after creating the cluster, whenever we resize it, on adding new node. Following errors are noticed. — Usman Azhar
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from <MasterNode DNS> , Sending SHUTDOWN signal to the NodeManager. at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:374) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:252) — Usman Azhar

Zhang Bo Zhang Bo · Accepted Answer · 2019-09-10T20:35:03

This error looks like a YARN node manager decommission problem. Can you check whether there is mistake on following YARN include/exclude node configuration file in Dataproc master GCE VM:

/etc/hadoop/conf/nodes_exclude
/etc/hadoop/conf/nodes_include

After change these config file, please run refresh node command:

yarn rmadmin -refreshNodes

Then you should expect to see the Nodemanager rejoin the YARN.

For details, please refer to: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html#nodeslistmanager-detects-and-handles-include-and-exclude-list-changes

DataProc Cluster Spark Job submission fails to start NodeManager

1 Answers