I am reading the original MapReduce paper. My understanding is that when working with, say hundreds of GBs of data, the network bandwidth for transferring so much data can be the bottleneck of a MapReduce job. For map tasks, we can reduce network bandwidth by scheduling map tasks on workers that already contain the data for any given split, since reading from local disk does not require network bandwidth.
However, the shuffle phase seems to be a huge bottleneck. A reduce task can potentially receive intermediate key/value pairs from all map tasks, and almost all of these intermediate key/value pairs will be streamed across the network.
When working with hundreds of GBs of data or more, is it necessary to use a combiner to have an efficient MapReduce job?