What is this Hadoop -Mapreduce job info means?

Question

I ran Hadoop-Mapreduce job wordcount program on 1MB data. I have some doubts to understand the information bellow:

What is counter?
Why maptasks are two , as I know that number of maps are decided by # of input split ,and minimum size of input split is 64MB. So logically there should be only one Map task!?
What is the size of output data from reducers?
CPU time spent , which CPU cause each tasktracker has its own CPU &memory?

Thanks a lot!

[user1@li417-43 ~]$ hadoop jar wordcount1.jar wordcount1.WordCount -D mapred.reduce.tasks=10 wordin wordout10-1m
    14/12/16 19:55:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    14/12/16 19:55:46 INFO mapred.FileInputFormat: Total input paths to process : 1
    14/12/16 19:55:46 INFO mapred.JobClient: Running job: job_201405031326_0032
    14/12/16 19:55:47 INFO mapred.JobClient:  map 0% reduce 0%
    14/12/16 19:55:59 INFO mapred.JobClient:  map 100% reduce 0%
    14/12/16 19:56:04 INFO mapred.JobClient:  map 100% reduce 40%
    14/12/16 19:56:09 INFO mapred.JobClient:  map 100% reduce 80%
    14/12/16 19:56:14 INFO mapred.JobClient:  map 100% reduce 100%
    14/12/16 19:56:15 INFO mapred.JobClient: Job complete: job_201405031326_0032
    14/12/16 19:56:15 INFO mapred.JobClient: Counters: 34
    14/12/16 19:56:15 INFO mapred.JobClient:   File System Counters
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of bytes read=2008100
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of bytes written=5988058
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of read operations=0
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of large read operations=0
    14/12/16 19:56:15 INFO mapred.JobClient:     FILE: Number of write operations=0
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of bytes read=1005254
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of bytes written=140119
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of read operations=14
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of large read operations=0
    14/12/16 19:56:15 INFO mapred.JobClient:     HDFS: Number of write operations=20
    14/12/16 19:56:15 INFO mapred.JobClient:   Job Counters
    14/12/16 19:56:15 INFO mapred.JobClient:     Launched map tasks=2
    14/12/16 19:56:15 INFO mapred.JobClient:     Launched reduce tasks=10
    14/12/16 19:56:15 INFO mapred.JobClient:     Data-local map tasks=1
    14/12/16 19:56:15 INFO mapred.JobClient:     Rack-local map tasks=1
    14/12/16 19:56:15 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=12953
    14/12/16 19:56:15 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=49609
    14/12/16 19:56:15 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    14/12/16 19:56:15 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    14/12/16 19:56:15 INFO mapred.JobClient:   Map-Reduce Framework
    14/12/16 19:56:15 INFO mapred.JobClient:     Map input records=35293
    14/12/16 19:56:15 INFO mapred.JobClient:     Map output records=181014
    14/12/16 19:56:15 INFO mapred.JobClient:     Map output bytes=1646012
    14/12/16 19:56:15 INFO mapred.JobClient:     Input split bytes=206
    14/12/16 19:56:15 INFO mapred.JobClient:     Combine input records=0
    14/12/16 19:56:15 INFO mapred.JobClient:     Combine output records=0
    14/12/16 19:56:15 INFO mapred.JobClient:     Reduce input groups=14276
    14/12/16 19:56:15 INFO mapred.JobClient:     Reduce shuffle bytes=2008160
    14/12/16 19:56:15 INFO mapred.JobClient:     Reduce input records=181014
    14/12/16 19:56:15 INFO mapred.JobClient:     Reduce output records=14276
    14/12/16 19:56:15 INFO mapred.JobClient:     Spilled Records=362028
    14/12/16 19:56:15 INFO mapred.JobClient:     CPU time spent (ms)=26020
    14/12/16 19:56:15 INFO mapred.JobClient:     Physical memory (bytes) snapshot=1427562496
    14/12/16 19:56:15 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=8291246080
    14/12/16 19:56:15 INFO mapred.JobClient:     Total committed heap usage (bytes)=477896704
    14/12/16 19:56:15 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
    14/12/16 19:56:15 INFO mapred.JobClient:     BYTES_READ=1002479

ALSimon ALSimon · Accepted Answer · 2014-12-17T13:22:39

Counter : 34 is the number of counters (number of information below)
I think, this is due to speculative execution (search speculative on [https://developer.yahoo.com/hadoop/tutorial/module4.html]. Hadoop launches 2 times the same mapper to see which will finish first (and then the second one is killed). You can disable it be changing the mapred.map.tasks.speculative.execution configuration property in the mapred-site.xml file.

One mapper was launch in local, the second one on the same rack but on an other node. (Data-local map tasks=1, Rack-local map tasks=1)

You have 14276 lines in the output of yours reducers (Reduce output records=14276).
CPU time spent (ms) is the total of CPU time consumed by each task on each node. It's for comparison purpose.

What is this Hadoop -Mapreduce job info means?

1 Answers