0
votes

I am trying to deploy Hadoop-RDMA on 8 node IB (OFED-1.5.3-4.0.42) cluster and got into the following problem (a.k.a File ... could only be replicated to 0 nodes, instead of 1):

frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal ../pg132.txt /user/frolo/input/pg132.txt
Warning: $HADOOP_HOME is deprecated.

14/02/05 19:06:30 WARN hdfs.DFSClient: DataStreamer Exception: java.lang.reflect.UndeclaredThrowableException
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown Source)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown Source)
    at com.sun.proxy.$Proxy1.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.From.Code(Unknown Source)
    at org.apache.hadoop.hdfs.From.F(Unknown Source)
    at org.apache.hadoop.hdfs.From.F(Unknown Source)
    at org.apache.hadoop.hdfs.The.run(Unknown Source)
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/frolo/input/pg132.txt could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown Source)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source)
    at org.apache.hadoop.ipc.rdma.madness.Code(Unknown Source)
    at org.apache.hadoop.ipc.rdma.madness.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(Unknown Source)
    at org.apache.hadoop.ipc.rdma.be.run(Unknown Source)
    at org.apache.hadoop.ipc.rdma.RDMAClient.Code(Unknown Source)
    at org.apache.hadoop.ipc.rdma.RDMAClient.call(Unknown Source)
    at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source)
    ... 12 more`

14/02/05 19:06:30 WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null
14/02/05 19:06:30 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/frolo/input/pg132.txt" - Aborting...
14/02/05 19:06:30 INFO hdfs.DFSClient: exception in isClosed

It seems that data is not transferred to DataNodes when I start copying from local filesystem to HDFS. I tested availability of DataNodes:

frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report
Warning: $HADOOP_HOME is deprecated.

Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0`

-------------------------------------------------
Datanodes available: 0 (4 total, 4 dead)`

`Name: 10.10.1.13:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014


Name: 10.10.1.14:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014


Name: 10.10.1.16:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:54 MSK 2014


Name: 10.10.1.11:50010
Decommission Status : Normal
Configured Capacity: 0 (0 KB)
DFS Used: 0 (0 KB)
Non DFS Used: 0 (0 KB)
DFS Remaining: 0(0 KB)
DFS Used%: 100%
DFS Remaining%: 0%
Last contact: Wed Feb 05 19:02:55 MSK 2014

and tried to mkdir in HDFS filesystem which has been successful. Restarting of Hadoop daemons have not produced any positive effect.

Could you please help me with this issue? Thank you.

Best, Alex

2
It seems that I have not notices that capacity is 0KB. Can not understand why?Alexander
your data nodes are not up, check datanode logs. "Datanodes available: 0 (4 total, 4 dead)"rVr

2 Answers

4
votes

I have found my problem. The issue was related to configuration of hadoop.tmp.dir which has been set to NFS partition. By default it is configured to /tmp which is local fs. After removing hadoop.tmp.dir from core-site.xml the problem has been solved.

0
votes

In my case, this issue was resolved by opening the firewall on por 50010