Hadoop distcp jobs SUCCEEDED but attempt_xxx killed by ApplicationMaster

Question

Running a distcp job I encounter the following problem: Almost all map tasks are marked as successful but with note saying Container killed.

On the online interface the log for the map jobs says: Progress 100.00 State SUCCEEDED

but under Note it says for almost every attempt (~200) Container killed by the ApplicationMaster. Container killed by the ApplicationMaster. Container killed on request. Exit code is 143

In the log file associated with the attempt I can see a log saying Task 'attempt_xxxxxxxxx_0' done.

stderr output is empty for all jobs/attempts.

When looking at the application master log and following one of the successful (but killed) attempts I find the following logs:

2017-01-05 10:27:22,772 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1483370705805_4012_m_000000_0
2017-01-05 10:27:22,773 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1483370705805_4012_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2017-01-05 10:27:22,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1483370705805_4012Job Transitioned from RUNNING to COMMITTING
2017-01-05 10:27:22,776 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT
2017-01-05 10:27:23,118 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,125 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e116_1483370705805_4012_01_000002
2017-01-05 10:27:24,126 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:1 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0
2017-01-05 10:27:24,126 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1483370705805_4012_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

i have set "mapreduce.map.speculative=false"!

All MAP task are SUCCEEDED(distcp job has no REDUCE)，but MAPREDUCE is going for a long time(several hours) , then it is succeeded and distcp job is done.

I am running 'yarn version'= Hadoop 2.5.0-cdh5.3.1

Should I be worried about this? And what causes the containers to be killed? Any suggestions would be greatly appreciated!

facha facha · Accepted Answer · 2017-01-05T09:45:13

Those killed attempts might be due to speculative execution. In this case there is nothing to worry about.

To make sure it is the case, try running your distcp like this:

hadoop distcp  -Dmapreduce.map.speculative=false ...

You should stop seeing those killed attempts.

Hadoop distcp jobs SUCCEEDED but attempt_xxx killed by ApplicationMaster

1 Answers