1
votes

This is driving me crazy. I have been working on this for days and just can't seem to solve this issue. I have a private cloud running on eucalyptus for testing and 4 VMs running Ubuntu 12.04. I am trying to get cloudera to run HDFS and map-reduce however when I try to start it up, the data-nodes never seem to be able to communicate with the name-node. It installs fine and passes all the pre-launch checks. Host files are all set up with 127.0.0.1 localhost and the ip and hostnames of the other vms, firewalls are all disable and security groups are set to allow everything. I can connect to the 8022 port from the data-nodes to the name-node with telnet and netstat on the name-node looks like this:

tcp 0 0 172.31.254.119:9000 0.0.0.0:* LISTEN 6519/python
tcp 0 0 0.0.0.0:7432 0.0.0.0:* LISTEN 5672/postgres
tcp 0 0 127.0.0.1:9001 0.0.0.0:* LISTEN 6538/python
tcp 0 0 172.31.254.119:50090 0.0.0.0:* LISTEN 8694/java
tcp 0 0 0.0.0.0:7180 0.0.0.0:* LISTEN 5680/java
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN 5680/java
tcp 0 0 172.31.254.119:8020 0.0.0.0:* LISTEN 8689/java
tcp 0 0 172.31.254.119:50070 0.0.0.0:* LISTEN 8689/java
tcp 0 0 172.31.254.119:8022 0.0.0.0:* LISTEN 8689/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 576/sshd
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 5486/postgres
tcp6 0 0 :::7432 :::* LISTEN 5672/postgres
tcp6 0 0 :::22 :::* LISTEN 576/sshd

yet the error I keep getting is:

Failed to publish event: SimpleEvent{attributes={STACKTRACE=[org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(172.31.254.110, storageID=DS-1259113373-172.31.254.110-50010-1378398035331, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=cluster9;nsid=46459994;c=0)

I would greatly appreciate any advice from anyone with more Linux/cloudera/eucalyptus experience then I.

Thanks all.

1

1 Answers

1
votes

You have specified that you are using loopback, but the DN is identifying itself as 172.31.254.110. Use proper hostname instead of 127.0.0.1. To be on the safer side add the hostname and IP of each machine into the /etc/hosts files of all other machines. If problem still persists, show me your config files.