when will the number/nodes for the reducers be allocated in the mapreduce job execution?

Question

When reading about MapReduce, I read the below interesting lines:

"But how do the Reducer’s know which nodes to query to get their partitions? This happens through the Application Master. As each Mapper instance completes, it notifies the Application Master about the partitions it produced during its run. Each Reducer periodically queries the Application Master for Mapper hosts until it has received a final list of nodes hosting its partitions."

I have a doubt here. When they say Each Reducer what does it mean exactly? Will the reducers be allocated before the starting of the map phase and also how are the reducer nodes chosen?

Praveen Sripati Praveen Sripati · Accepted Answer · 2015-06-22T15:11:05

Reducers can start before the maps are done with the processing of the data. Once they start they can pull the data from the mapper machines, but they will start the processing only after all the mappers are done processing of the data.

mapred.reduce.slowstart.completed.maps is the property to configure this behaviour. More information on the property here.

when will the number/nodes for the reducers be allocated in the mapreduce job execution?

1 Answers