I have been trying to run a pig job with multiple steps on Amazon EMR. Here are the details of my environment:
Number of nodes: 20 AMI Version: 3.1.0 Hadoop Distribution: 2.4.0
The pig script has multiple steps and it spawns a long-running map reduce job that has both a map phase and reduce phase. After running for sometime (sometimes an hour, sometimes three or four), the job is killed. The information on the resource manager for the job is:
Kill job received from hadoop (auth:SIMPLE) at Job received Kill while in RUNNING state.
Obviously, I did not kill it :)
My question is: how do I go about trying to identify what exactly happened? How do I diagnose the issue? Which log files to look at (what to grep for)? Any help on even where the appropriate log files would be greatly helpful. I am new to YARN/Hadoop 2.0