I set up a 2-node Hadoop cluster, and running start-df.sh and start-yarn.sh works nicely (i.e. all expected services are running, no errors in the logs).
However, when I actually try to run an application, several tasks fail:
15/04/01 15:27:53 INFO mapreduce.Job: Task Id : attempt_1427894767376_0001_m_000008_2, Status : FAILED
I checked the yarn and datanode logs, but nothing is reported there. In the userlogs, the syslogs files on the slave node all contain the following error message:
2015-04-01 15:27:21,077 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: slave.domain.be./127.0.1.1:53834. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2015-04-01 15:27:21,078 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From slave.domain.be./127.0.1.1 to slave.domain.be.:53834 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
So the problem is that the slave cannot connect to itself..
I checked whether there is a process running on the slave node listening at port 53834, but there is none. However, all 'expected' ports are being listened on (50020,50075,..). Nowhere in my configuration I have used port 53834. It's always a different port on different runs.
Any ideas on fixing this issue?