CPU cores configuration in Spark does not work

Question

I was trying to set up Spark Standalone Mode following the tutorial at http://spark.apache.org/docs/latest/spark-standalone.html.

The tutorial says that we can pass "-c CORES" to the worker to set the total number of CPU cores allowed. Based on my understanding of the tutorial, I tried to use only 16 cores out of 40 cores of my worker machine by starting the cluster using:

${SPARK_HOME}/sbin/start-all.sh -c 16

Then I ran a job by using spark-submit and looked at the webUI at master:8080. However, it showed that it was still using all 40 cores instead of 16. I read the start-master.sh and start-slave.sh in ${SPARK_HOME}/sbin and I don't think they are actually parsing the arguments.

So the only way I can limit the number of cores of an application at the moment is set the SPARK_WORKER_CORES in ${SPARK_HOME}/conf/spark_env.sh

I am wondering how I could use the -c argument as discussed in the tutorial.

Xiangyu Xiangyu · Accepted Answer · 2015-11-20T18:01:03

The issue I submitted (issues.apache.org/jira/browse/SPARK-11841) was answered. -c, -m and -d can be only passed to start-slave.sh. And you can only run those scripts on the machine that you want to be used as worker. The usage is:

start-slave.sh <masterURL> [options]

For example, to start a slave with 16 cores, we can use:

start-slave.sh spark://master_ip:7077 -c 16

CPU cores configuration in Spark does not work

2 Answers