0
votes

Hope you all had a wonderful vacation. I am trying to setup Hadoop cluster on Amazon EC2. While copying data file from local disk to hdfs with the command hadoop fs -copyFromLocal d.txt /user/ubuntu/data, I am getting data replication error. The error from the log is following

15/01/06 07:40:36 WARN hdfs.DFSClient: Error Recovery for null bad datanode[0] nodes == null

15/01/06 07:40:36 WARN hdfs.DFSClient: Could not get block locations. Source file /user/ubuntu/data/d.txt" - > Aborting... copyFromLocal: java.io.IOException: File /user/ubuntu/data/d.txt could only be replicated to 0 nodes, instead of 1

15/01/06 07:40:36 ERROR hdfs.DFSClient: Failed to close file /user/ubuntu/data/d.txt

Now, I had been checking StackOverFlow and other forums about this problem and I found most of them talk about DataNode, TaskTracker not running as a probable cause & relevant solutions. But these things are running fine in my setup. The screenshot of the JPS command http://i.imgur.com/vS6kRPP.png

From HadooWiki, the other possible causes are DataNode not able talk to the server, through networking or Hadoop configuration problems or some configuration problem is preventing effective two-way communication.

I have configured hadoop-env.sh, core-site.xml, hdfs-site.xml and mapred-site.xml following the tutorial http://tinyurl.com/l2wv6y9 . Could anyone tell please me where I am going wrong ? I will be immensely grateful if anyone help me to resolve the problem.

Thanks,

3

3 Answers

1
votes

Well, the problem was in security groups. When I've created the EC2 instances I created a new security group in which I haven't configured the rules for allowing ports to open for connection.

While creating a group with default options, we must add a rule for SSH at port 22. In order to have TCP and ICMP access we need to add 2 additional security rules. Add ‘All TCP’, ‘All ICMP’ and ‘SSH (22)’ under the inbound rules, This should work fine.

If we are using an existing security group, we should check the Inbound and outbound rules.

0
votes

Make sure you can reach the data nodes (telnet port) so that the communication issue can be discarded from the equation.

0
votes

This exception can occur due to several reasons. Here the data is not getting written to the datanode. The possible reasons can be.

1) The configured security rules are not permitting proper communication.

2) Data node storage is full.

3) If the datanode is having a different namespace id than that of the cluster.

4) If the datanode is engaged with block scanning and reporting.

5) Negative value of block size configured (dfs.block.size in hdfs-site.xml).

If all the configurations and security rules are proper. Then you can do the following tasks.

1) Stop the datanode process.

2) Delete the contents inside datanode directory.

3) Start the datanode again.

The above steps are for making the cluster id proper in the datanode.

NB: The best way to debug your problem is by checking the logs of datanode and namenode. That will give you the exact reason for this error.