Running Hadoop jobs on YARN - Killing container

Question

I'm having problem running Hadoop jobs on YARN, concretely on Ambari. I'm new to Hadoop, have written MR jobs, but have no experience in cluster administration.

I'm trying to run WordCount example for a small input file (like 1.4MB) and most of the time I get an exception like the following:

Application application_1453983463294_0005 failed 2 times due to AM Container for appattempt_1453983463294_0005_000002 exited with exitCode: -104
For more detailed output, check application tracking page:http://hdp-master.nissatech.local:8088/cluster/app/application_1453983463294_0005Then, click on links to logs of each attempt.
Diagnostics: Container [pid=23429,containerID=container_1453983463294_0005_02_000001] is running beyond physical memory limits. Current usage: 264.6 MB of 256 MB physical memory used; 1.9 GB of 537.6 MB virtual memory used. Killing container.

It seems that I should change heap limit. I don't understand how is it possible that such amount of heap is needed for such a small job?

YARN was installed using Ambari default setup, so I haven't change any of the parameters. This is a small cluster with 4 machines, 3 of which are used as DataNodes/NodeManagers (and have RegionServers which are not used at the moment). Each worker has 4GB of RAM and 4 cores.

What is the concrete problem and how to solve it?

Additionally, I would be thankful for any reference which could help me understand how to setup and configure a small cluster (e.g. up to 10 machines). By that I mean what amount of RAM and CPU to use.

Remus Rusanu Remus Rusanu · Accepted Answer · 2016-02-01T14:00:35

Looks to me like the container being killed is the AM, not the job. That would be the Application Manager, in other words the "map-reduce" app running on your Yarn. This means that it does no matter how simple the WordCount sample is, is not the offending container.

Can you check the configured value for yarn.app.mapreduce.am.resource.mb? The default is 1.5GB and it appears that your cluster is configured to not allow containers over 256Mb. What is the configured yarn.nodemanager.resource.memory-mb?

I'm not sure how Ambari configured your cluster resources, but looks like you'll have to tune it manually. Follow a guide like How to Plan and Configure YARN and MapReduce 2 in HDP 2.0 or Tuning the Cluster for MapReduce v2 (YARN).

Running Hadoop jobs on YARN - Killing container

1 Answers