3
votes

I am running spark in HPC environment on slurm using Spark standalone mode spark version 1.6.1. The problem is my slurm node is not fully used in the spark standalone mode. I am using spark-submit in my slurm script. There are 16 cores available on a node and I get all 16 cores per executor as I see on SPARK UI. But only one core per executor is actually utilized. top + 1 command on the worker node, where executor process is running, shows that only one cpu is being used out of 16 cpus. I have 255 partitions, so partitions does not seems a problem here.

$SPARK_HOME/bin/spark-submit \
            --class se.uu.farmbio.vs.examples.DockerWithML \
            --master spark://$MASTER:7077 \
            --executor-memory 120G \
            --driver-memory 10G \

When I change script to

$SPARK_HOME/bin/spark-submit \
            --class se.uu.farmbio.vs.examples.DockerWithML \
            --master local[*] \
            --executor-memory 120G \
            --driver-memory 10G \

I see 0 cores allocated to executor on Spark UI which is understandable because we are no more using spark standalone cluster mode. But now all the cores are utilized when I check top + 1 command on worker node which hints that problem is not with the application code but with the utilization of resources by spark standalone mode.

So how spark decides to use one core per executor when it has 16 cores and also have enough partitions? What can I change so it can utilize all cores?

I am using spark-on-slurm for launching the jobs.

Spark configurations in both cases are as fallows:

--master spark://MASTER:7077

(spark.app.name,DockerWithML)                       
(spark.jars,file:/proj/b2015245/bin/spark-vs/vs.examples/target/vs.examples-0.0.1-jar-with-dependencies.jar)                        
(spark.app.id,app-20170427153813-0000)                      
(spark.executor.memory,120G)                        
(spark.executor.id,driver)                      
(spark.driver.memory,10G)                       
(spark.history.fs.logDirectory,/proj/b2015245/nobackup/eventLogging/)                       
(spark.externalBlockStore.folderName,spark-75831ca4-1a8b-4364-839e-b035dcf1428d)                        
(spark.driver.maxResultSize,2g)                     
(spark.executorEnv.OE_LICENSE,/scratch/10230979/SureChEMBL/oe_license.txt)                      
(spark.driver.port,34379)                       
(spark.submit.deployMode,client)                        
(spark.driver.host,x.x.x.124)                       
(spark.master,spark://m124.uppmax.uu.se:7077)

--master local[*]

(spark.app.name,DockerWithML)                                   
(spark.app.id,local-1493296508581)                                  
(spark.externalBlockStore.folderName,spark-4098cf14-abad-4453-89cd-3ce3603872f8)                                    
(spark.jars,file:/proj/b2015245/bin/spark-vs/vs.examples/target/vs.examples-0.0.1-jar-with-dependencies.jar)                                    
(spark.driver.maxResultSize,2g)                                 
(spark.master,local[*])                                 
(spark.executor.id,driver)                                  
(spark.submit.deployMode,client)                                    
(spark.driver.memory,10G)                                   
(spark.driver.host,x.x.x.124)                                   
(spark.history.fs.logDirectory,/proj/b2015245/nobackup/eventLogging/)                                   
(spark.executorEnv.OE_LICENSE,/scratch/10230648/SureChEMBL/oe_license.txt)                                  
(spark.driver.port,36008)

Thanks,

3
What kind of job are you running? And what is the data source? Some jobs are not parallelizable, and some data sources like compressed files can't be partitioned initially.Vidya
The application is parallelizable and works well in a cloud environment. Because of less resources on cloud, I had to move to HPC. And the files are not compressed, they are SDF files with molecules.Laeeq

3 Answers

2
votes

The problem is that you only have one worker node. In spark standalone mode, one executor is being launched per worker instances. To launch multiple logical worker instances in order to launch multiple executors within a physical worker, you need to configure this property: SPARK_WORKER_INSTANCES

By default, it is set to 1. You can increase it accordingly based on the computation you are doing in your code to utilize the amount of resources you have.

You want your job to be distributed among executors to utilize the resources properly,but what's happening is only one executor is getting launched which can't utilize the number of core and the amount of memory you have. So, you are not getting the flavor of spark distributed computation.

You can set SPARK_WORKER_INSTANCES = 5 And allocate 2 cores per executor; so, 10 cores would be utilized properly. Like this, you tune the configuration to get optimum performance.

1
votes

Try setting spark.executor.cores (default value is 1)

According to the Spark documentation :

the number of cores to use on each executor. For YARN and standalone mode only. In standalone mode, setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application will run on each worker.

See https://spark.apache.org/docs/latest/configuration.html

0
votes

In spark cluster mode you should use command --num-executor "numb_tot_cores*num_of_nodes". For example if you have 3 nodes with 8 cores per node you should write --num-executors 24