I am running spark in HPC environment on slurm using Spark standalone mode spark version 1.6.1. The problem is my slurm node is not fully used in the spark standalone mode. I am using spark-submit in my slurm script. There are 16 cores available on a node and I get all 16 cores per executor as I see on SPARK UI. But only one core per executor is actually utilized. top + 1 command on the worker node, where executor process is running, shows that only one cpu is being used out of 16 cpus. I have 255 partitions, so partitions does not seems a problem here.
$SPARK_HOME/bin/spark-submit \
--class se.uu.farmbio.vs.examples.DockerWithML \
--master spark://$MASTER:7077 \
--executor-memory 120G \
--driver-memory 10G \
When I change script to
$SPARK_HOME/bin/spark-submit \
--class se.uu.farmbio.vs.examples.DockerWithML \
--master local[*] \
--executor-memory 120G \
--driver-memory 10G \
I see 0 cores allocated to executor on Spark UI which is understandable because we are no more using spark standalone cluster mode. But now all the cores are utilized when I check top + 1 command on worker node which hints that problem is not with the application code but with the utilization of resources by spark standalone mode.
So how spark decides to use one core per executor when it has 16 cores and also have enough partitions? What can I change so it can utilize all cores?
I am using spark-on-slurm for launching the jobs.
Spark configurations in both cases are as fallows:
--master spark://MASTER:7077
(spark.app.name,DockerWithML)
(spark.jars,file:/proj/b2015245/bin/spark-vs/vs.examples/target/vs.examples-0.0.1-jar-with-dependencies.jar)
(spark.app.id,app-20170427153813-0000)
(spark.executor.memory,120G)
(spark.executor.id,driver)
(spark.driver.memory,10G)
(spark.history.fs.logDirectory,/proj/b2015245/nobackup/eventLogging/)
(spark.externalBlockStore.folderName,spark-75831ca4-1a8b-4364-839e-b035dcf1428d)
(spark.driver.maxResultSize,2g)
(spark.executorEnv.OE_LICENSE,/scratch/10230979/SureChEMBL/oe_license.txt)
(spark.driver.port,34379)
(spark.submit.deployMode,client)
(spark.driver.host,x.x.x.124)
(spark.master,spark://m124.uppmax.uu.se:7077)
--master local[*]
(spark.app.name,DockerWithML)
(spark.app.id,local-1493296508581)
(spark.externalBlockStore.folderName,spark-4098cf14-abad-4453-89cd-3ce3603872f8)
(spark.jars,file:/proj/b2015245/bin/spark-vs/vs.examples/target/vs.examples-0.0.1-jar-with-dependencies.jar)
(spark.driver.maxResultSize,2g)
(spark.master,local[*])
(spark.executor.id,driver)
(spark.submit.deployMode,client)
(spark.driver.memory,10G)
(spark.driver.host,x.x.x.124)
(spark.history.fs.logDirectory,/proj/b2015245/nobackup/eventLogging/)
(spark.executorEnv.OE_LICENSE,/scratch/10230648/SureChEMBL/oe_license.txt)
(spark.driver.port,36008)
Thanks,