I am trying to deploy Hadoop-RDMA on 8 node IB (OFED-1.5.3-4.0.42) cluster and got into the following problem (a.k.a File ... could only be replicated to 0 nodes, instead of 1):
frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfs -copyFromLocal ../pg132.txt /user/frolo/input/pg132.txt Warning: $HADOOP_HOME is deprecated. 14/02/05 19:06:30 WARN hdfs.DFSClient: DataStreamer Exception: java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Unknown Source) at com.sun.proxy.$Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.From.Code(Unknown Source) at org.apache.hadoop.hdfs.From.F(Unknown Source) at org.apache.hadoop.hdfs.From.F(Unknown Source) at org.apache.hadoop.hdfs.The.run(Unknown Source) Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/frolo/input/pg132.txt could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(Unknown Source) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.ipc.RPC$Server.call(Unknown Source) at org.apache.hadoop.ipc.rdma.madness.Code(Unknown Source) at org.apache.hadoop.ipc.rdma.madness.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(Unknown Source) at org.apache.hadoop.ipc.rdma.be.run(Unknown Source) at org.apache.hadoop.ipc.rdma.RDMAClient.Code(Unknown Source) at org.apache.hadoop.ipc.rdma.RDMAClient.call(Unknown Source) at org.apache.hadoop.ipc.Tempest.invoke(Unknown Source) ... 12 more` 14/02/05 19:06:30 WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null 14/02/05 19:06:30 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/frolo/input/pg132.txt" - Aborting... 14/02/05 19:06:30 INFO hdfs.DFSClient: exception in isClosed
It seems that data is not transferred to DataNodes when I start copying from local filesystem to HDFS. I tested availability of DataNodes:
frolo@A11:~/hadoop-rdma-0.9.8> ./bin/hadoop dfsadmin -report Warning: $HADOOP_HOME is deprecated. Configured Capacity: 0 (0 KB) Present Capacity: 0 (0 KB) DFS Remaining: 0 (0 KB) DFS Used: 0 (0 KB) DFS Used%: �% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0` ------------------------------------------------- Datanodes available: 0 (4 total, 4 dead)` `Name: 10.10.1.13:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:54 MSK 2014 Name: 10.10.1.14:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:54 MSK 2014 Name: 10.10.1.16:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:54 MSK 2014 Name: 10.10.1.11:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Feb 05 19:02:55 MSK 2014
and tried to mkdir in HDFS filesystem which has been successful. Restarting of Hadoop daemons have not produced any positive effect.
Could you please help me with this issue? Thank you.
Best, Alex