0
votes

I have been trying to run a pig job with multiple steps on Amazon EMR. Here are the details of my environment:

Number of nodes: 20 AMI Version: 3.1.0 Hadoop Distribution: 2.4.0

The pig script has multiple steps and it spawns a long-running map reduce job that has both a map phase and reduce phase. After running for sometime (sometimes an hour, sometimes three or four), the job is killed. The information on the resource manager for the job is:

Kill job received from hadoop (auth:SIMPLE) at Job received Kill while in RUNNING state.

Obviously, I did not kill it :)

My question is: how do I go about trying to identify what exactly happened? How do I diagnose the issue? Which log files to look at (what to grep for)? Any help on even where the appropriate log files would be greatly helpful. I am new to YARN/Hadoop 2.0

1

1 Answers

0
votes

There can be number of reasons. Enable debugging on your cluster and see in the stderr logs for more information.

aws emr create-cluster --name "Test cluster" --ami-version 3.9 --log-uri s3://mybucket/logs/ \
--enable-debugging --applications Name=Hue Name=Hive Name=Pig

More details here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html