1
votes

Help please Folks

I am trying to set up my Hadoop multinode env (1 master, 1 secondary and 3 slaves - hadoop 2.7.1/Ubuntu 14 on AWS) and i am getting "NameSystem.getDatanode" ERROR message. I browsed and read and tried but reach my limits. Could you point me at least in some direction

Logs (extract) from master - xxx-141/142/143 are the ip of the slaves ''''''''''''''''''''''''''''''''''' Line 134: 2016-01-23 17:36:19,432 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.143:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 172.31.22.141:50010 is expected to serve this storage.

Line 135: 2016-01-23 17:36:19,457 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 172.31.22.141:50010 is expected to serve this storage.

Line 159: 2016-01-23 17:36:20,988 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.141:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node XXX.XX.XX.143:50010 is expected to serve this storage.

Extract From SLAVE2 SERVER logs '''''''''''''''''''''''''''''''''''''

2016-01-23 17:36:14,812 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: `2016-01-23 17:36:18,607 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x3c90bbfe60c, containing 1 storage report(s), of which we sent 0. The reports had 1 total blocks and used 0 RPC(s). This took 4 msec to generate and 144 msecs for RPC and NN processing. Got back no commands. 2016-01-23 17:36:18,608 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000 is shutting down org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(1XX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 1XX.XX.XX.141:50010 is expected to serve this storage. at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:495) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1791) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1315) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:163) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28543) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy13.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:199)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:463)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:688)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823)
at java.lang.Thread.run(Thread.java:745)

2016-01-23 17:36:18,610 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000 2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) 2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-1050309752-MAST.XX.XX.169-1453113991010 2016-01-23 17:36:20,611 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode 2016-01-23 17:36:20,613 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0 2016-01-23 17:36:20,614 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at ip-SLAV-XX-XX-142/1XX.XX.XX.142 ************************************************************/`

1

1 Answers

0
votes

it looks like you have three slaves

172.31.22.141:50010
172.31.22.142:50010
172.31.22.143:50010

and you created two of them from a clone of the first slave, after the slave was already included in the cluster. The two clones now already have a copy of the DFS and use the same storage ID as the first slave. Only one slave with the same ID is expected by the name server. It is trying to tell you this by logging:

[...] is attempting to report storage ID [...].
Node [...]:50010 is expected to serve this storage.

You can try removing the dfs directory on two of the slaves, then restarting them.

i.e. stop the slaves, do a rm -rf on the dfs directory like:

rm -rf /tmp/hadoop-hadoop/dfs/

You can then restart and check that all slaves are connecting and test file replication, e.g. by setting the replication level to 4 for some files like:

hdfs dfs -setrep -w 4 -R /user/somedir

The -w option causes the command to wait until replication has succeeded.