one mapper sometimes does not start

Question

I am creating a Hadoop MapReduce job and I am using two Scans over one HBase table to feed my mappers. The HBase table has 10 regions. I create two scanners, call setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, tableName) on them, then I do this:

    job.setPartitionerClass(NaturalKeyPartitioner.class);
    job.setGroupingComparatorClass(NaturalKeyGroupingComparator.class);
    job.setSortComparatorClass(CompositeKeyComparator.class);
    TableMapReduceUtil.initTableMapperJob(scans, FaultyRegisterReadMapper.class, MeterTimeKey.class, ReadValueTime.class, job);

For some reason, only two mappers are created most of the time. I would like there to be more mappers but that's not really a big deal.

The really bad part is that SOMETIMES it created three mappers and when it does, the first two mappers finish quite quickly but the third mapper doesn't even start for five minutes. It is this mapper that takes so long to start that is really bothering me. :)

This is on a cluster with some 60 nodes and it is not busy.

I suspect the number of mappers might be driven by how much data it's finding in the table but I'm not positive of that.

Main question: Any ideas why one mapper takes so long to start?

Tariq Tariq · Accepted Answer · 2014-05-28T19:23:36

Along with the hardware resources of my nodes I would also check the network traffic. You might be suffering from network saturation(interface errors, framing errors etc).

After that I would make sure of the following things :

RegionServer Hotspotting : Uneven key-space distribution can lead to a huge number of requests to a single region, bombarding the RegionServer process, causing slow response time. Do you have keys consisting of timeseries kinda data?
Non-local data regions : Perhaps your job is requesting data which is not local to the DataNode(RegionServers run on DataNodes), thus forcing HDFS to request data blocks from other servers over the network(Involves network traffic as well).

one mapper sometimes does not start

1 Answers