2
votes

Recently YARN and more specifically Resource Manager will not start. Hunting through the logs in /var/log/hadoop-yarn/yarn/yarn-yarn-resourcemanager-scottvih2sa-92-namenode.log I found the error:

2015-12-02 20:18:13,287 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1241)) - Error starting ResourceManager
java.lang.IllegalArgumentException: Illegal capacity of -1.0 for node-label=default in queue=root, valid capacity should in range of [0, 100].

I look at the property: yarn.scheduler.capacity.root.accessible-node-labels.default.capacity with Ambari and indeed it is set to -1. I change this property along with the maximum-capacity property and YARN/Resource Manager will start.

So I know what is the problem but don't know why. I've been using Ambari blueprints to install HDP 2.2.x for many months and up until recently everything has been fine with YARN. I'm not overriding any of the scheduler properties with the blueprint so I don't think I'm doing anything wrong.

Is anyone else seeing the same thing? Could it be something I'm doing wrong in my blueprint if others are not having a problem? If this an HDP/Ambari bug in the latest release?

My version is HDP 2.2.9.0-3393

2
I am using HDP 2.3.0. I never faced this problem. The capacity for the default queue is always set to 95: <property> <name>yarn.scheduler.capacity.root.default.capacity</name> <value>95</value> <description>Default queue target capacity.</description> </property> - Manjunath Ballur
I think this is a new problem with 2.2.9. Good to know 2.3.0 doesn't have the problem. - scott frolich
I believe I'm having the same issue: community.hortonworks.com/questions/6519/…. - slm

2 Answers

2
votes

I had this same problem and it turned out to be an due to 2 options that were getting applied with a value that was out of bounds for both of them.

<property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels.default.capacity</name>
  <value>-1</value>
</property>

<property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity</name>
  <value>-1</value>
</property>

To work around the issue I removed these entries and restarted ResourceManager service.

NOTE: I had to make these changes through Ambari, though, editing the above file did not seem to take, not sure why.

   ss1

   ss2

The above looks to be a bug in Ambari: https://issues.apache.org/jira/browse/AMBARI-13232. Thanks to JonasStraub for helping to dig this all up.

References

1
votes

In the thread slm stated https://community.hortonworks.com/questions/6519/resourcemanager-cannot-start.html it was implying that Ambari might have something to do with the problem. I looked and I was using Ambari 2.1.0. I changed to use Ambari 2.1.2 and the problem was fixed.