I use CDH5.1.0 (hadoop 2.3.0). 2 name nodes (2x 32GB RAM, 2 cores) and 3 data nodes (3x 16GB RAM, 2 cores)
I am scheduling mapreduce jobs from a single user in the default queue (there are no other users and no other queues configured).
When using capacity scheduler, following happens: I am able to submit multiple jobs but only 2 jobs are being executed (status 'running') in parallel.
When using fair scheduler, following happens: I am submitting multiple jobs and 4 jobs are being set to status 'running' by cluster/scheduler. These jobs remain on 5% progress forever. If single jobs are being killed, new job is being set to status 'running' on 5%, again, with no further progress. Jobs start to execute only after there are less than 4 jobs and no further jobs are submitted to the queue.
I have re-configured the cluster multiple times but was never able to increase number of running jobs when using capacity scheduler or avoid hanging-up of jobs when using fair scheduler
My question is - how to configure cluster/yarn/scheduler/dynamic and static resource pools to make scheduling work?
Here are some of the config parameters:
yarn.scheduler.minimum-allocation-mb = 2GB
yarn.scheduler.maximum-allocation-mb = 12GB
yarn.scheduler.minimum-allocation-vcores = 1
yarn.scheduler.maximum-allocation-vcores = 2
yarn.nodemanager.resource.memory-mb = 12GB
yarn.nodemanager.resource.cpu-vcores = 2
mapreduce.map.memory.mb = 12GB
mapreduce.reduce.memory.mb = 12GB
mapreduce.map.java.opts.max.heap = 9.6GB
mapreduce.reduce.java.opts.max.heap = 9.6GB
yarn.app.mapreduce.am.resource.mb = 12GB
ApplicationMaster Java Maximum Heap Size = 788MB
mapreduce.task.io.sort.mb = 1GB
I have left Static and Dynamic Resource Pools with the default (cloudera) settings (e.g. Max Running Apps setting is empty)