1
votes

I was trying performance testing of a Map reduce program in a cluster of 10 Data nodes. During the process I strated with 5 Reducers , then 10 and so on.

I was thinking that increasing the number of reducers would also make the job finis faster, but what I observed that after 30, even you increase the number till 70 the job runtime didnt make any difference, though there are more then 400 unique keys and increasing the number of reducers beyond 30 should make a difference Isn't it?

Can anyone help me to understand if there could be any particular reason for this?

Thanks in advance

1
Corroding to my calculation the code should optimize till number of Reducer =40 , then it will degrade/not improve ? Can you please share your findings ?saurabh shashank
The findings are these- when the number of reducer is 40 the job finishes at around 39 mins and when the nuber of reducer is 60 the job finsihes around 48 mins. Thats waht confused me, it should have finsihed in less time with more reducers. Thats what I was expecting.javanx
Hey i will ans the question as soon as i can .I will ans with all theoretical Map Reduce mathematics.saurabh shashank

1 Answers

1
votes

There are multiple reasons why you don't observe an increase of performance. What benchmarks are u running when you do this performance testing? Also how many mappers do you have? What's the size of your job etc...

From what you briefly described here we can suspect that you don't have enough mappers and/or they finish too slow.

Keep in mind that the reducers can only start their job as soon as there is available output from the mappers. Therefore it might be the case that your 30 reducers are plenty to handle the output speed and size of your mappers so adding more will make no difference since most of them are idle waiting for the mappers to finish and provide fresh data for the reducers to process.

I hope this helps!