I have a Dataproc cluster:
master - 6cores| 32g
worker{0-7} - 6cores| 32g
Maximum allocation: memory:24576, vCores:6
Have two spark-streaming jobs to submit, one after another
In the first place, I tried to submit with default configurations spark.dynamicAllocation.enabled=true
In 30% of cases, I saw that the first job caught almost all available memory and the second was queued and waited for resources for ages. (This is a streaming job which took a small portion of resources every batch ).
My second try was to change a dynamic allocation. I submitted the same two jobs with identical configurations:
spark.dynamicAllocation.enabled=false
spark.executor.memory=12g
spark.executor.cores=3
spark.executor.instances=6
spark.driver.memory=8g
Surprisingly in Yarn UI I saw:
7 Running Containers with 84g Memory allocation for the first job.
3 Running Containers with 36g Memory allocation and 72g Reserved Memory for the second job
In Spark UI there are 6 executors and driver for the first job and 2 executors and driver for the second job
After retrying(deleting previous jobs and submitting the same jobs) without dynamic allocation and same configurations, I got a totally different result:
5 containers 59g Memory allocation for both jobs and 71g Reserved Memory for the second job. In spark UI I see 4 executors and driver in both cases.
I have a couple of questions:
- If dynamicAllocation=false, why the number of yarn containers is different from the number of executors? (Firstly I thought that additional yarn container is a driver, but it differs in memory.)
- If dynamicAllocation=false, Why Yarn doesn't create containers by my exact requirements- 6 containers(spark executors) for both jobs. Why two different attempts with the same configuration lead to different results?
- If dynamicAllocation=true - how may it be possible that low consuming memory spark job takes control of all Yarn resources
Thanks