0
votes

Even a simple WordCount mapreduce also fails with same error.

Hadoop 2.6.0

Below are the Yarn logs.

It seems some sort of timeout happens during resource negotiation.
But i am unable to verify the same, exactly what causes timeout.

2016-11-11 15:38:09,313 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1478856936677_0004_000002. Got exception: java.io.IOException: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.0.37.145:49054 remote=platform-demo/10.0.37.145:60487]; Host Details : local host is: "platform-demo/10.0.37.145"; destination host is: "platform-demo":60487; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1472) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy79.startContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.0.37.145:49054 remote=platform-demo/10.0.37.145:60487] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) ... 9 more Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.0.37.145:49054 remote=platform-demo/10.0.37.145:60487] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at java.io.FilterInputStream.read(FilterInputStream.java:133) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553) at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) ... 12 more

2016-11-11 15:38:09,319 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1478856936677_0004_000002 with final state: FAILED, and exit status: -1000 2016-11-11 15:38:09,319 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1478856936677_0004_000002 State change from ALLOCATED to FINAL_SAVING

I tried to change below properties

yarn.nodemanager.resource.memory-mb
2200 Amount of physical memory, in MB, that can be allocated for containers.

yarn.scheduler.minimum-allocation-mb
500

dfs.datanode.socket.write.timeout
3000000

dfs.socket.timeout 3000000

1
can you share the mapreduce job command and yarn web UI?Nirmal Ram
Show us the output of netstat -tapun and the content of your /etc/hosts, please.Alfonso Nishikawa
@AlfonsoNishikawa please find netsat output pastebin.com/esKK6CdPManoj Verma
Hi @AlfonsoNishikawa thanks i get many connections to 60487, I fixed it. But now there is different issue after job scheduled, job hangs . Yarn logs on following location :----pastebin.com/4c17Kv73Manoj Verma

1 Answers

1
votes

Q1.MapReduce Jobs failing, after accepted by YARN

Reason, multiple connections around 130 stuck on port 60487.

Q2.MapReduce Jobs failing, after accepted by YARN

Issue is due to hadoop tmp /app/hadoop/tmp. Empty this directory and re-tried MAPR job, job was executed successfully.

Q3.Unhealthy Node local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir

Edit yarn-site.xml with folowing property.

<property>
        <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
        <value>98.5</value>
</property>

Refer Why does Hadoop report "Unhealthy Node local-dirs and log-dirs are bad"?