Hadoop Container Cleanup Timeout on lost node

Question

I am Working on Multinode Cluster with four slaves node named as slave01,slave02,slave03 and slave04 and one master node as master

when i remove out the network cable during map task hadoop wait for status update for 100 seconds (due to property of whose value is 100000)

after that i can see that maptask get failed and hadoop start container cleanup which takes more than 10 minutes and it also doesn't schedule failed task anywhere.the i get error of no Route to host exception from application master to lost node.After which task get schedule on another node.

i want to reduce the time for trying container cleanup so that task can be schedule just after timeout of maptask on any node.

please help me that how can i do that by setting configuration.

I am attaching application master log in which i have remove slave01 during map task,in this case no of reduce task running is 1.

AttemptID:attempt_1463201584280_0004_m_000002_0 Timed out after 100 secs Container released on a lost node cleanup failed for container container_1463201584280_0004_01_000004 : java.net.NoRouteToHostException: No Route to Host from slave02/172.31.132.107 to slave01:58838 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost at sun.reflect.GeneratedConstructorAccessor51.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757) at org.apache.hadoop.ipc.Client.call(Client.java:1473) at org.apache.hadoop.ipc.Client.call(Client.java:1400) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy37.stopContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:110) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy38.stopContainers(Unknown Source) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:206) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:373) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:706) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:369) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1522) at org.apache.hadoop.ipc.Client.call(Client.java:1439) ... 15 more

Ashish Kumar Ashish Kumar · Accepted Answer · 2016-05-16T18:01:22

This happened because of an bug in hadoop 2.6.3 in which connection retry in done at two levels ipc and yarn try to use 2.6.4 or download patch it will get resolved.

Hadoop Container Cleanup Timeout on lost node

1 Answers