Hi we have recently upgraded to yarn from mr1. I know that container is an abstract notion but I don't understand how many jvm task (map, reduce, filter etc) one container can spawn or other way to ask is is container reusable across mutltiple map or reduce tasks. I read in following blog : What is a container in YARN?
"each mapper and reducer runs on its own container to be accurate!"
which means if I look at AM logs I should see number of container allocated equal to number of map tasks (failed|success) plus number of reduce task is that correct?
I know number of containers changes during Application life cycle, based on AM requests, splits, scheduler etc.
But is there a way to request initial number of minimum container for given application. I think one way is to configure fair-scheduler queue. But is there anything else that can dictate this?
In case of MR if I have mapreduce.map.memory.mb = 3gb
and
mapreduce.map.cpu.vcores=4
. I also have yarn.scheduler.minimum-allocation-mb = 1024m
and yarn.scheduler.minimum-allocation-vcores = 1
.
Does that mean I will get one container with 4 cores or 4 containers with one core?
Also its not clear where can you specify mapreduce.map.memory.mb
and mapreduce.map.cpu.vcores
. Should they be set in client node or can they be set per application as well?
Also from RM UI or AM UI is there a way to see currently assigned containers for given application?