Hadoop MapReduce Jobs: Get the counter outside of the native jvm

Question

I am very new in Hadoop and Hbase.

And my use case is very simple: I want to get reduce input groups count for a job in the run time (i.e get the counter being updated from the initiation to the termination of the job ).

What I have searched so far: All job related logs are written under directory /var/log/hadoop/userlogs like shown below:

[root@dev1-slave1 userlogs]# pwd
/var/log/hadoop/userlogs
[root@dev1-slave1 userlogs]# ll
total 24
drwx--x--- 2 mapred mapred 4096 Jan 13 19:59 job_201501121917_0008
drwx--x--- 2 mapred mapred 4096 Jan 13 11:31 job_201501121917_0009
drwx--x--- 2 mapred mapred 4096 Jan 13 12:01 job_201501121917_0010
drwx--x--- 2 mapred mapred 4096 Jan 13 12:13 job_201501121917_0011
drwx--x--- 2 mapred mapred 4096 Jan 13 12:23 job_201501121917_0012
drwx--x--- 2 mapred mapred 4096 Jan 13 19:59 job_201501121917_0013

Under each job, there are directories such as attempt_201501121917_0013_m_000000_0 (mapper log) and attempt_201501121917_0013_r_000000_0 (reducer log).

The reducer log directory attempt_201501121917_0013_r_000000_0 contains syslog which contains information about job run. But it doesn't show any information about the counter.

From the jobtracker UI of hadoop, I could see the counter reduce input groups being updated until the job is finished but I could not find the same elsewhere.

How can I achieve this? Is there any Java API to get job-wise counters in an another application (NOT in the application which is performing mapreduce tasks) ?

Any other logs or other files which I should look into?

I hope my requirement is clear.

UPDATE:

Hadoop version: Hadoop 1.0.3-Intel

yurgis yurgis · Accepted Answer · 2015-01-13T19:54:41

Assuming you know your job id, you can look up your job by id (I think for some limited time depending how soon your cluster cleans up job history).

public long getInputGroups(String jobId, Configuration conf) {
    Cluster cluster = new Cluster(conf);
    Job job = cluster.getJob(JobID.forName(jobId));
    Counters counters = job.getCounters();
    Counter counter = counters.findCounter("org.apache.hadoop.mapred.Task$Counter","REDUCE_I‌NPUT_GROUPS");
    return counter.getValue();
}

For more reading see Hadoop: The Definitive Guide.

Hadoop MapReduce Jobs: Get the counter outside of the native jvm

3 Answers