1
votes

i want to run many job at the same time on a Hadoop cluster but i want to prevent some jobs to starting reduce phase (making reduce slots busy or reserved) before all map tasks of that job being complete. is there any config for jobs to make theme limit like above?

Thanks.

2

2 Answers

4
votes

Reduce slow start By default, schedulers wait until 5% of the map tasks in a job have completed before scheduling reduce tasks for the same job. For large jobs this can cause problems with cluster utilization, since they take up reduce slots while waiting for the map tasks to complete. Setting mapred.reduce.slowstart.completed.maps to a higher value, such as 0.80 (80%), can help improve throughput.

refrence : Hadoop definitive guide 3rd edition , Chapter 9: Setting Up a Hadoop Cluster page:316

2
votes

You can get default values here for Apache Hadoop mapred.reduce.slowstart.completed.maps has the value 0.05 which is

Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.