1
votes

When reading about MapReduce, I read the below interesting lines:

"But how do the Reducer’s know which nodes to query to get their partitions? This happens through the Application Master. As each Mapper instance completes, it notifies the Application Master about the partitions it produced during its run. Each Reducer periodically queries the Application Master for Mapper hosts until it has received a final list of nodes hosting its partitions."

I have a doubt here. When they say Each Reducer what does it mean exactly? Will the reducers be allocated before the starting of the map phase and also how are the reducer nodes chosen?

1

1 Answers

0
votes

Reducers can start before the maps are done with the processing of the data. Once they start they can pull the data from the mapper machines, but they will start the processing only after all the mappers are done processing of the data.

mapred.reduce.slowstart.completed.maps is the property to configure this behaviour. More information on the property here.