1
votes

I want to control the number of records processed by each mapper.

In my cluster some data nodes have more number of records. So the mappers created at that nodes process more no. of records. So these mappers running for long duration.

Mapper processing time does not depend on my record size. Number of records decide the time. So is there any way to control the number of records processed by each mapper?

1

1 Answers

0
votes

You can supply -D mapreduce.input.fileinputformat.split.maxsize=some number. You can arrive at this number by knowing how many records each mapper should process and what would be there size.