I have an Hadoop 2.2 cluster deployed on a small number of powerful machines. I have a constraint to use YARN as the framework, which I am not very familiar with.
- How do I control the number of actual map and reduce tasks that will run in parallel? Each machine has many CPU cores (12-32) and enough RAM. I want to utilize them maximally.
- How can I monitor that my settings actually led to a better utilization of the machine? Where can I check how many cores (threads, processes) were used during a given job?
Thanks in advance for helping me melt these machines :)