Edit: Looks like it's not a good practice to retrieve the counters in the map and reduce tasks using Job or JobConf. Here is an alternate approach for passing the summary details from the mapper to the reducer. This approach requires some effort to code, but is doable. It would have been nice if the feature had been part of Hadoop and not required to hand code it. I have requested to put this feature into Hadoop and waiting for the response.
JobCounter.TOTAL_LAUNCHED_MAPS was retrieved using the below code in the Reducer class with the old MR API.
private String jobID;
private long launchedMaps;
public void configure(JobConf jobConf) {
try {
jobID = jobConf.get("mapred.job.id");
JobClient jobClient = new JobClient(jobConf);
RunningJob job = jobClient.getJob(JobID.forName(jobID));
if (job == null) {
System.out.println("No job with ID found " + jobID);
} else {
Counters counters = job.getCounters();
launchedMaps = counters.getCounter(JobCounter.TOTAL_LAUNCHED_MAPS);
}
} catch (Exception e) {
e.printStackTrace();
}
}
With the new API, Reducer implementations can access the Configuration for the job via the JobContext#getConfiguration(). The above code can be implemented in Reducer#setup().
Reducer#configure() in the old MR API and Reducer#setup() in the new MR API, are invoked once for each reduce task before the Reducer.reduce() is invoked.
BTW, the counters can be got from other JVM also beside the one which kicked the job.
JobInProgress is defined as below, so it should not be used. This API is for limited projects only and the interface may change.
@InterfaceAudience.LimitedPrivate({"MapReduce"})
@InterfaceStability.Unstable
Not that, JobCounter.TOTAL_LAUNCHED_MAPS also includes map tasks launched due to speculative execution also