How can I read counter (for example, the number of output records) of each reduce task

Question

I am running iterative hadoop/mapreduce jobs to analyze certain data. (apache hadoop version 1.1.0) and I need to know the number of output records of each reduce task to run the next iteration of M/R job. I can read the consolidated counter after each M/R job but I cannot find the way to read counter of each task separately. Please advise me regarding this.

Choi

davek davek · Accepted Answer · 2014-08-08T08:24:25

That's not how counters work: each task reports its metrics to a central point, so there is no way of knowing the counter values from individual tasks.

From here: http://www.thecloudavenue.com/2011/11/retrieving-hadoop-counters-in-mapreduce.html

Counters can be incremented using the Reporter for the Old MapReduce API or by using the Context using the New MapReduce API. These counters are sent to the TaskTracker and the TaskTracker will send to the JobTracker and the JobTracker will consolidate the Counters to produce a holistic view for the complete Job. The consolidated Counters are not relayed back to the Map and the Reduce tasks by the JobTracker. So, the Map and Reduce tasks have to contact the JobTracker to get the current value of the Counter.

I suppose you could create a task-specific counter (prefix the counter name, for example) but you would then end up with a lot of different counters, and, as they are designed to be light-weight, you might run into problems (although the threshold level is fairly high: I once tested the limit and the node crashed when I reached something like a million counters!)

How can I read counter (for example, the number of output records) of each reduce task

1 Answers