I set up a 2 node hadoop cluster on aws with the namenode and the jobtracker running on the master, and the tasktracker and datanode being both the master and slave. When I start the dfs, It tells me that it starts the namenode, the datanode on both nodes, and the secondary namenode. When I start map reduce it also tells me that the jobtracker was started, as well as the tasktracker on both nodes. I started to run an example to make sure it was working, but it said that only one tasktracker was being used, on the namenode web interface. I checked the logs and bot the datanode and tasktracker node logs on the slave had something along the lines of
2013-08-08 21:31:04,196 INFO org.apache.hadoop.ipc.RPC: Server at ip-10-xxx-xxx-xxx/10.xxx.xxx.xxx:9000 not available yet, Zzzzz...
2013-08-08 21:31:06,202 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ip-10-xxx-xxx-xxx/10.xxx.xxx.xxx:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
The namenode is running on port 9000, This was in the datanode log. In the tasktracker log, it had the same thing except it was port 9001; where the jobtracker was running. I was able to find something on the apache wiki about this error http://wiki.apache.org/hadoop/ServerNotAvailable but I couldn't find any of the possible problems they stated. Since I'm running both nodes on aws I also made sure that permissions were granted to both ports.
In summary.
The tasktracker and datanode on the slave node won't connect to the master
I know the ip addresses are right, i've checked multiple times
I can passphraseless ssh from both instances into each other and into themselves
The ports are granted permission on aws
based on the logs, both the namenode and the jobtracker are running fine
I put the the ips of the master and slave in the config files, rather than a hostname because when i did that and edited the /etc/hosts accordingly, it couldn't resolve it
Does anybody know of any other possible reasons?