2
votes

I am trying Hadoop map-reduce in a two-node Linux cluster (Ubuntu Virtual Machine) by following this tutorial.

When I run the wordcount map reduce program, the task is not being run on the slave. Can you help in identifying the problem?

Please find my logs and output files.

Jps output from master:

hduser@master:/usr/local/hadoop$ jps
8056 NodeManager
8696 Jps
7471 NameNode
7592 DataNode
7793 SecondaryNameNode
7933 ResourceManager

Jps output from slave:

hduser@slave:/usr/local/hadoop$ jps
3634 NodeManager
3518 DataNode
3722 Jps

Output from running the jar file:

hduser@master:/usr/local/hadoop$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/hduser/WordCount/ /user/hduser/WordCount/MultiNode_Output
15/11/30 04:07:16 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/11/30 04:07:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/11/30 04:07:17 INFO input.FileInputFormat: Total input paths to process : 3
15/11/30 04:07:17 INFO mapreduce.JobSubmitter: number of splits:3
15/11/30 04:07:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1403989830_0001
15/11/30 04:07:18 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/11/30 04:07:18 INFO mapreduce.Job: Running job: job_local1403989830_0001
15/11/30 04:07:18 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/11/30 04:07:18 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/11/30 04:07:18 INFO mapred.LocalJobRunner: Waiting for map tasks
15/11/30 04:07:18 INFO mapred.LocalJobRunner: Starting task: attempt_local1403989830_0001_m_000000_0
15/11/30 04:07:18 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/11/30 04:07:18 INFO mapred.MapTask: Processing split: hdfs://master:54310/user/hduser/WordCount/pg4300.txt:0+1573151
15/11/30 04:07:19 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/11/30 04:07:19 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/11/30 04:07:19 INFO mapred.MapTask: soft limit at 83886080
15/11/30 04:07:19 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/11/30 04:07:19 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/11/30 04:07:19 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/11/30 04:07:19 INFO mapreduce.Job: Job job_local1403989830_0001 running in uber mode : false
15/11/30 04:07:19 INFO mapreduce.Job:  map 0% reduce 0%
15/11/30 04:07:19 INFO input.LineRecordReader: Found UTF-8 BOM and skipped it
15/11/30 04:07:20 INFO mapred.LocalJobRunner: 
15/11/30 04:07:20 INFO mapred.MapTask: Starting flush of map output
15/11/30 04:07:20 INFO mapred.MapTask: Spilling map output
15/11/30 04:07:20 INFO mapred.MapTask: bufstart = 0; bufend = 2601881; bufvoid = 104857600
15/11/30 04:07:20 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25142496(100569984); length = 1071901/6553600
15/11/30 04:07:22 INFO mapred.MapTask: Finished spill 0
15/11/30 04:07:22 INFO mapred.Task: Task:attempt_local1403989830_0001_m_000000_0 is done. And is in the process of committing
15/11/30 04:07:22 INFO mapred.LocalJobRunner: map
15/11/30 04:07:22 INFO mapred.Task: Task 'attempt_local1403989830_0001_m_000000_0' done.
15/11/30 04:07:22 INFO mapred.LocalJobRunner: Finishing task: attempt_local1403989830_0001_m_000000_0
15/11/30 04:07:22 INFO mapred.LocalJobRunner: Starting task: attempt_local1403989830_0001_m_000001_0
15/11/30 04:07:22 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/11/30 04:07:22 INFO mapred.MapTask: Processing split: hdfs://master:54310/user/hduser/WordCount/5000-8.txt:0+1428841
…
…
15/11/30 04:07:27 INFO mapred.LocalJobRunner: reduce task executor complete.
15/11/30 04:07:27 INFO mapreduce.Job:  map 100% reduce 100%
15/11/30 04:07:28 INFO mapreduce.Job: Job job_local1403989830_0001 completed successfully
15/11/30 04:07:28 INFO mapreduce.Job: Counters: 38
    File System Counters
        FILE: Number of bytes read=4010919
        FILE: Number of bytes written=8405245
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=11928267
        HDFS: Number of bytes written=883509
        HDFS: Number of read operations=37
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=6
    Map-Reduce Framework
        Map input records=78578
        Map output records=629920
        Map output bytes=6083556
        Map output materialized bytes=1462980
        Input split bytes=352
        Combine input records=629920
        Combine output records=101397
        Reduce input groups=82616
        Reduce shuffle bytes=1462980
        Reduce input records=101397
        Reduce output records=82616
        Spilled Records=202794
        Shuffled Maps =3
        Failed Shuffles=0
        Merged Map outputs=3
        GC time elapsed (ms)=433
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=657997824
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=3676562
    File Output Format Counters 
        Bytes Written=883509

hadoop-hduser-datanode-slave.log:

   2243 2015-11-30 04:13:56,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: slave/192.168.56.102:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
   2244 2015-11-30 04:13:56,748 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: slave/192.168.56.102:54310

yarn-hduser-nodemanager-slave.log:

991 2015-11-30 04:09:39,366 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 127.0.0.1/127.0.0.1:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    992 2015-11-30 04:09:40,367 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 127.0.0.1/127.0.0.1:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    993 2015-11-30 04:09:41,368 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 127.0.0.1/127.0.0.1:8031. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    994 2015-11-30 04:09:42,369 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 127.0.0.1/127.0.0.1:8031. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    995 2015-11-30 04:09:43,370 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 127.0.0.1/127.0.0.1:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

When I go to http://localhost:50070/ I can see only one Live Node which is my master.

1
Sounds like a connectivity (firewall?) problem. Check that your hadoop ports (cloudera.com/content/www/en-us/documentation/enterprise/latest/…) are unblocked and reachable from all the nodes in your cluster.highlycaffeinated
From my master I am able to ping to my slave as well as ssh into it. In case of a firewall problem that shouldn't be working right ? Similarly from my slave I am able to ping and ssh to my master.Anit
A firewall could be allowing ping packets and ssh connections but still blocking connections to other ports. From the logs above, your datanode is trying and failing to connect to a host at 192.168.56.102 on port 54310 and your master node is trying and failing to connect to localhost on port 8031. Are you sure your slave node is configured with the correct master node name/ip address?highlycaffeinated

1 Answers

0
votes

I guess the problem should be solved, if all the following tests are ok:

  • stop the cluster (do a jps to check that hadoop is not running)
  • change the /etc/hosts file, like that, where localIpOfMaster should be something like 127.0.0.1:
<localIpOfMaster> localhost
<publicIpOfMaster> master
<ipOfSlave> slave1
  • in the slaves file of hadoop configuration use the same name for the slave (here slave1).
  • make sure that you can ssh without password to the slave, using the same name as in the /etc/hosts file and the slaves file (here ssh slave1)
  • start the cluster again and check with jps that everything is working. check the logs again to verify that there are no connection errors

Hope it helped. If not, since it's been a while since you posted that and the problem may be fixed already, please share with us the solution that you may have found, so that others facing the same issue can find it.