Control number of records processed by a mapper

Question

I want to control the number of records processed by each mapper.

In my cluster some data nodes have more number of records. So the mappers created at that nodes process more no. of records. So these mappers running for long duration.

Mapper processing time does not depend on my record size. Number of records decide the time. So is there any way to control the number of records processed by each mapper?

Pradyumna Mohapatra Pradyumna Mohapatra · Accepted Answer · 2014-05-02T12:46:04

You can supply -D mapreduce.input.fileinputformat.split.maxsize=some number. You can arrive at this number by knowing how many records each mapper should process and what would be there size.

Control number of records processed by a mapper

1 Answers