Spark 2 on YARN is utilizing more cluster resource automatically

Question

I am on CDH 5.7.0 and I could see a strange issue with spark 2 running on YARN cluster. Hereunder is my job submit command

spark2-submit --master yarn --deploy-mode cluster --conf "spark.executor.instances=8" --conf "spark.executor.cores=4" --conf "spark.executor.memory=8g" --conf "spark.driver.cores=4" --conf "spark.driver.memory=8g" --class com.learning.Trigger learning-1.0.jar

Even though I have limited the number of cluster resources my job can use, I could see the resource utilization is more than the allocated amount.

The job starts with basic memory consumption like 8G of memory and would eat us the whole cluster.

I do not have dynamic allocation set to true. I am just triggering an INSERT OVERWRITE query on top of SparkSession.

Any pointers would be very helpful.

Manish Saraf Bhardwaj Manish Saraf Bhardwaj · Accepted Answer · 2019-06-20T07:22:31

I created Resource Pool in cluster and assigned some resource as

Min Resources : 4 Virtual Cores and 8 GB memory

Used these pool to assign a spark job to limit the usages of resource (VCores and memory).

e.g. spark2-submit --class org.apache.spark.SparkProgram.rt_app --master yarn --deploy-mode cluster --queue rt_pool_r1 /usr/local/abc/rt_app_2.11-1.0.jar

If anyone has better options to archive the same please let us know.

Spark 2 on YARN is utilizing more cluster resource automatically

1 Answers