7
votes

I have a 4 node cluster (1 Namenode/Resource Manager 3 datanodes/node managers)

I am trying to run a simple tez example orderedWordCount

hadoop jar C:\HDP\tez-0.4.0.2.1.1.0-1621\tez-mapreduce-examples-0.4.0.2.1.1.0-1621.jar orderedwordcount sample/test.txt /sample/out

The job gets accepted ,the Application master and container gets setup but on the nodemanager I see these logs

2014-09-10 17:53:31,982 INFO [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030

2014-09-10 17:53:34,060 INFO [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

After configurable timeout the job fails

I searched for this problem and it always pointed to yarn.resourcemanager.scheduler.address configuration. In all my resource manager node and node managers I have this configuration defined correctly but for some reason its not getting picked up

<property>
<name>yarn.resourcemanager.hostname</name>
<value>10.234.225.69</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
3

3 Answers

6
votes

It might be possible that your ResourceManager is listening on an IPv6 Port while your worker nodes (i.e NodeManagers) might be using IPv4 to connect to the ResourceManager

To quickly check if this is the case, do a

netstat -aln | grep 8030

If you get something similar to :::8030, then your ResourceManager is indeed listening on an IPv6 Port. If its a IPv4 port, you should see something similar to 0.0.0.0:8030

To fix this, you might want to consider disabling IPv6 on all your machines and try once again.

1
votes

There is a problem in the Hadoop2 code with configuring the yarn.resourcemanager.scheduler.address e.g.:

<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>qadoop-nn001.apsalar.com:8030</value>
</property>

It is currently not properly placed into the 'conf' configuration at hadoop-2.7.0/src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java

To prove the issue, we patched that file to directly inject our scheduler address. The patch below is a hack. The root cause is with the 'conf' object that needs to load the property "yarn.resourcemanager.scheduler.address".

    @Private
protected static <T> T createRMProxy(final Configuration configuration, final Class<T> protocol, RMProxy instance) throws IOException {
    YarnConfiguration conf = (configuration instanceof YarnConfiguration)
        ? (YarnConfiguration) configuration
        : new YarnConfiguration(configuration);
    LOG.info("LEE: changing the conf to include yarn.resourcemanager.scheduler.address at 10.1.26.1");
    conf.set("yarn.resourcemanager.scheduler.address", "10.1.26.1");
    RetryPolicy retryPolicy = createRetryPolicy(conf);
    if (HAUtil.isHAEnabled(conf)) {
      RMFailoverProxyProvider<T> provider =
          instance.createRMFailoverProxyProvider(conf, protocol);
      return (T) RetryProxy.create(protocol, provider, retryPolicy);
    } else {
      InetSocketAddress rmAddress = instance.getRMAddress(conf, protocol);
      LOG.info("LEE: Connecting to ResourceManager at " + rmAddress);
      T proxy = RMProxy.<T>getProxy(conf, protocol, rmAddress);
      return (T) RetryProxy.create(protocol, proxy, retryPolicy);
    }   
} 

EDIT: we solved this problem by adding yarn-site.xml to the CLASSPATH. there is no need to modify RMProxy.java

0
votes

It is because your resource manager is not reachable. Try to ping you resource manager from other nodes and see if it works. Maintain these configs consistent across cluster.