I'm having problem running Hadoop jobs on YARN, concretely on Ambari. I'm new to Hadoop, have written MR jobs, but have no experience in cluster administration.
I'm trying to run WordCount example for a small input file (like 1.4MB) and most of the time I get an exception like the following:
Application application_1453983463294_0005 failed 2 times due to AM Container for appattempt_1453983463294_0005_000002 exited with exitCode: -104
For more detailed output, check application tracking page:http://hdp-master.nissatech.local:8088/cluster/app/application_1453983463294_0005Then, click on links to logs of each attempt.
Diagnostics: Container [pid=23429,containerID=container_1453983463294_0005_02_000001] is running beyond physical memory limits. Current usage: 264.6 MB of 256 MB physical memory used; 1.9 GB of 537.6 MB virtual memory used. Killing container.
It seems that I should change heap limit. I don't understand how is it possible that such amount of heap is needed for such a small job?
YARN was installed using Ambari default setup, so I haven't change any of the parameters. This is a small cluster with 4 machines, 3 of which are used as DataNodes/NodeManagers (and have RegionServers which are not used at the moment). Each worker has 4GB of RAM and 4 cores.
What is the concrete problem and how to solve it?
Additionally, I would be thankful for any reference which could help me understand how to setup and configure a small cluster (e.g. up to 10 machines). By that I mean what amount of RAM and CPU to use.