0
votes

There is a hadoop map-red job on a large amount of data. The map phase takes a long time to complete (~2-3 days). It completed.

But the task failed at ~92% in the reduce phase. Is it possible to retrieve the output/computations performed by the successful map tasks so that only reduce phase may need to be re-run?

Running Hadoop 1.2.1, Java7, Single node linux system.

1

1 Answers

0
votes

No this isn't possible. If the logic of your mapper is computationally intensive (instead of IO heavy) you can either multithread using the MultithreadedMapper or try to split your job into two jobs. The second job would then just "indentity map" the output of the longer running previous job.