I just finished setting up a small hadoop cluster (using 3 ubuntu machines and apache hadoop 2.2.0) and am now trying to run python streaming jobs.
Running a test job I encounter the following problem:
Almost all map tasks are marked as successful but with note saying Container killed.
On the online interface the log for the map jobs says:
Progress 100.00
State SUCCEEDED
but under Note it says for almost every attempt (~200)
Container killed by the ApplicationMaster.
or
Container killed by the ApplicationMaster. Container killed on request. Exit code is 143
In the log file associated with the attempt I can see a log saying Task 'attempt_xxxxxxxxx_0' done.
I also get 3 attempts with the same log, only those 3 have
State KILLED
which are under killed jobs.
stderr output is empty for all jobs/attempts.
When looking at the application master log and following one of the successful (but killed) attempts I find the following logs:
- Transitioned from NEW to UNASSIGNED
- Transitioned from UNASSIGNED to ASSIGNED
- several progress updates, including: 1.0
- Done acknowledgement
- RUNNING to SUCCESS_CONTAINER_CLEANUP
- CONTAINER_REMOTE_CLEANUP
- KILLING attempt_xxxx
- Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
- Task Transitioned from RUNNING to SUCCEEDED
All the attempts are numbered xxxx_0 so I assume they are not killed as a result of speculative execution.
Should I be worried about this? And what causes the containers to be killed? Any suggestions would be greatly appreciated!