0
votes

I have run a Map Reduce job in a cluster. I created a job on HDInsight with 2 namenodes and 4 datanodes.

I did not set number of map tasks and reduce tasks. After execution of my mapreduce job, I got result as below.

I notice that number of launched reduce task is 1.Dose that mean my job is only executed in only one node? How can I see how many nodes were used in this job?

File System Counters
                FILE: Number of bytes read=2209390166
                FILE: Number of bytes written=3314494070
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                WASB: Number of bytes read=1084887535
                WASB: Number of bytes written=1205106549
                WASB: Number of read operations=0
                WASB: Number of large read operations=0
                WASB: Number of write operations=0
        Job Counters
                Launched map tasks=2
                Launched reduce tasks=1
                Rack-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=148221
                Total time spent by all reduces in occupied slots (ms)=302038
                Total time spent by all map tasks (ms)=148221
                Total time spent by all reduce tasks (ms)=151019
                Total vcore-seconds taken by all map tasks=148221
                Total vcore-seconds taken by all reduce tasks=151019
                Total megabyte-seconds taken by all map tasks=113833728
                Total megabyte-seconds taken by all reduce tasks=231965184
        Map-Reduce Framework
                Map input records=3820642
                Map output records=3820642
                Map output bytes=1092815223
                Map output materialized bytes=1104695065
                Input split bytes=286
                Combine input records=0
                Combine output records=0
                Reduce input groups=200
                Reduce shuffle bytes=1104695065
                Reduce input records=3820642
                Reduce output records=3820642
                Spilled Records=11461926
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=16229
                CPU time spent (ms)=225140
                Physical memory (bytes) snapshot=1377296384
                Virtual memory (bytes) snapshot=5068787712
                Total committed heap usage (bytes)=1175453696
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1084887106
        File Output Format Counters
                Bytes Written=68771556
1
Thank you ravindra. Mapreduce job is expected to run in multiple nodes and that is why a lot of data can be processed faster than before. In the condition reduce task number is 1, it is executed in only one node. This is my understanding, is it right?Frankie
yes. exactly. You can change number of reducers as per above post.Ravindra babu

1 Answers

0
votes

yes u set the reduce number is 1 the output reduce is 1 u can see it Map input records=3820642 Map output records=3820642 Map output bytes=1092815223 Map output materialized bytes=1104695065 Input split bytes=286 Combine input records=0 Combine output records=0 Reduce input groups=200 Reduce shuffle bytes=1104695065 Reduce input records=3820642 Reduce output records=3820642