I ran Hadoop-Mapreduce job wordcount program on 1MB data. I have some doubts to understand the information bellow:
- What is counter?
Why maptasks are two , as I know that number of maps are decided by # of input split ,and minimum size of input split is 64MB. So logically there should be only one Map task!?
What is the size of output data from reducers?
CPU time spent , which CPU cause each tasktracker has its own CPU &memory?
Thanks a lot!
[user1@li417-43 ~]$ hadoop jar wordcount1.jar wordcount1.WordCount -D mapred.reduce.tasks=10 wordin wordout10-1m
14/12/16 19:55:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/12/16 19:55:46 INFO mapred.FileInputFormat: Total input paths to process : 1
14/12/16 19:55:46 INFO mapred.JobClient: Running job: job_201405031326_0032
14/12/16 19:55:47 INFO mapred.JobClient: map 0% reduce 0%
14/12/16 19:55:59 INFO mapred.JobClient: map 100% reduce 0%
14/12/16 19:56:04 INFO mapred.JobClient: map 100% reduce 40%
14/12/16 19:56:09 INFO mapred.JobClient: map 100% reduce 80%
14/12/16 19:56:14 INFO mapred.JobClient: map 100% reduce 100%
14/12/16 19:56:15 INFO mapred.JobClient: Job complete: job_201405031326_0032
14/12/16 19:56:15 INFO mapred.JobClient: Counters: 34
14/12/16 19:56:15 INFO mapred.JobClient: File System Counters
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of bytes read=2008100
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of bytes written=5988058
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of read operations=0
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of large read operations=0
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of write operations=0
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of bytes read=1005254
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of bytes written=140119
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of read operations=14
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of write operations=20
14/12/16 19:56:15 INFO mapred.JobClient: Job Counters
14/12/16 19:56:15 INFO mapred.JobClient: Launched map tasks=2
14/12/16 19:56:15 INFO mapred.JobClient: Launched reduce tasks=10
14/12/16 19:56:15 INFO mapred.JobClient: Data-local map tasks=1
14/12/16 19:56:15 INFO mapred.JobClient: Rack-local map tasks=1
14/12/16 19:56:15 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=12953
14/12/16 19:56:15 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=49609
14/12/16 19:56:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/12/16 19:56:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/12/16 19:56:15 INFO mapred.JobClient: Map-Reduce Framework
14/12/16 19:56:15 INFO mapred.JobClient: Map input records=35293
14/12/16 19:56:15 INFO mapred.JobClient: Map output records=181014
14/12/16 19:56:15 INFO mapred.JobClient: Map output bytes=1646012
14/12/16 19:56:15 INFO mapred.JobClient: Input split bytes=206
14/12/16 19:56:15 INFO mapred.JobClient: Combine input records=0
14/12/16 19:56:15 INFO mapred.JobClient: Combine output records=0
14/12/16 19:56:15 INFO mapred.JobClient: Reduce input groups=14276
14/12/16 19:56:15 INFO mapred.JobClient: Reduce shuffle bytes=2008160
14/12/16 19:56:15 INFO mapred.JobClient: Reduce input records=181014
14/12/16 19:56:15 INFO mapred.JobClient: Reduce output records=14276
14/12/16 19:56:15 INFO mapred.JobClient: Spilled Records=362028
14/12/16 19:56:15 INFO mapred.JobClient: CPU time spent (ms)=26020
14/12/16 19:56:15 INFO mapred.JobClient: Physical memory (bytes) snapshot=1427562496
14/12/16 19:56:15 INFO mapred.JobClient: Virtual memory (bytes) snapshot=8291246080
14/12/16 19:56:15 INFO mapred.JobClient: Total committed heap usage (bytes)=477896704
14/12/16 19:56:15 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/12/16 19:56:15 INFO mapred.JobClient: BYTES_READ=1002479