How to prevent hadoop fail job due to failed reduce task

Question

I have running a s3distcp job in AWS EMR hadoop 2.2.0 version. And the job keep failed with a failed reducer task after 3 attempts. I also tried both:

mapred.max.reduce.failures.percent
mapreduce.reduce.failures.maxpercent

to be 50 to the oozie hadoop action configuration and mapred-site.xml. But still the job failed.

And here are the logs:

2015-10-02 14:42:16,001 INFO [main] org.apache.hadoop.mapreduce.Job: Task Id : attempt_1443541526464_0115_r_000010_2, Status : FAILED 2015-10-02 14:42:17,005 INFO [main] org.apache.hadoop.mapreduce.Job: map 100% reduce 93% 2015-10-02 14:42:29,048 INFO [main] org.apache.hadoop.mapreduce.Job: map 100% reduce 98% 2015-10-02 15:04:20,369 INFO [main] org.apache.hadoop.mapreduce.Job: map 100% reduce 100% 2015-10-02 15:04:21,378 INFO [main] org.apache.hadoop.mapreduce.Job: Job job_1443541526464_0115 failed with state FAILED due to: Task failed task_1443541526464_0115_r_000010 Job failed as tasks failed. failedMaps:0 failedReduces:1

2015-10-02 15:04:21,451 INFO [main] org.apache.hadoop.mapreduce.Job: Counters: 45 File System Counters FILE: Number of bytes read=280 FILE: Number of bytes written=10512783 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=32185011 HDFS: Number of bytes written=0 HDFS: Number of read operations=170 HDFS: Number of large read operations=0 HDFS: Number of write operations=28 Job Counters Failed reduce tasks=4 Launched map tasks=32 Launched reduce tasks=18 Data-local map tasks=15 Rack-local map tasks=17 Total time spent by all maps in occupied slots (ms)=2652786 Total time spent by all reduces in occupied slots (ms)=65506584 Map-Reduce Framework Map input records=156810 Map output records=156810 Map output bytes=30892192 Map output materialized bytes=6583455 Input split bytes=3904 Combine input records=0 Combine output records=0 Reduce input groups=0 Reduce shuffle bytes=7168 Reduce input records=0 Reduce output records=0 Spilled Records=156810 Shuffled Maps =448 Failed Shuffles=0 Merged Map outputs=448 Failed Shuffles=0 Merged Map outputs=448 GC time elapsed (ms)=2524 CPU time spent (ms)=108250 Physical memory (bytes) snapshot=14838984704 Virtual memory (bytes) snapshot=106769969152 Total committed heap usage (bytes)=18048614400 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=32181107 File Output Format Counters Bytes Written=0 2015-10-02 15:04:21,451 INFO [main] com.amazon.external.elasticmapreduce.s3distcp.S3DistCp: Try to recursively delete hdfs:/tmp/218ad028-8035-4f97-b113-3cfea04502fc/tempspace 2015-10-02 15:04:21,515 INFO [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2015-10-02 15:04:21,516 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.deflate] 2015-10-02 15:04:21,554 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1443541526464_0114_m_000000_0 is done. And is in the process of committing 2015-10-02 15:04:21,570 INFO [main] org.apache.hadoop.mapred.Task: Task attempt_1443541526464_0114_m_000000_0 is allowed to commit now 2015-10-02 15:04:21,584 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_1443541526464_0114_m_000000_0' to hdfs://rnd2-emr-head.ec2.int$ 2015-10-02 15:04:21,598 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1443541526464_0114_m_000000_0' done. 2015-10-02 15:04:21,616 INFO [Thread-6] amazon.emr.metrics.MetricsSaver: Inside MetricsSaver Shutdown Hook

Any suggestions would be much appreciated.

Prabhu Moorthy Prabhu Moorthy · Accepted Answer · 2015-10-05T08:54:44

Can you try cleaning the hdfs://tmp directory. Just take a backup of the directory as some other applications use tmp directory and in case you face any issues you can replace the tmp directory.

How to prevent hadoop fail job due to failed reduce task

1 Answers