1
votes

I'm processing the zip files in Hadoop. Each zip file contains 2000 XML files. A single mapper will take 90 to 60 min to complete the process. I'm using Windows and 6 core machine with 12 GB RAM .

My question is: My progress bar is showing only the result at the completion of the process. The progress status is being 0% until the completion of the task as below

enter image description here

How can I pragmatically change the progress value?

I tried the following code:

InputDocXmlCount++;
if (InputDocXmlCount % 100 == 0)
{
    context.progress();
    runningJob.mapProgress();
}

But I don't know how to do this? Can any one help me?

2

2 Answers

1
votes

MR framework code can't decide how to show percentage because (i assume) you are using some specific InputFormat. Obviously, framework is not so clever to count amount of xml files in zip for you and predict that you will report progress once per 100 records.

However, take a look at MR counters. You can, at least, count amount of xml files that you have already processed

0
votes

You don't have direct control of the progress value, but you could consider implementing a customized status message by calling TaskAttemptContext#setStatus from within your mapper code. For example, you could make this a dynamic message including the count of XML files processed, and periodically update that count in the status string.