1
votes

I was trying to set up Spark Standalone Mode following the tutorial at http://spark.apache.org/docs/latest/spark-standalone.html.

The tutorial says that we can pass "-c CORES" to the worker to set the total number of CPU cores allowed. Based on my understanding of the tutorial, I tried to use only 16 cores out of 40 cores of my worker machine by starting the cluster using:

${SPARK_HOME}/sbin/start-all.sh -c 16

Then I ran a job by using spark-submit and looked at the webUI at master:8080. However, it showed that it was still using all 40 cores instead of 16. I read the start-master.sh and start-slave.sh in ${SPARK_HOME}/sbin and I don't think they are actually parsing the arguments.

So the only way I can limit the number of cores of an application at the moment is set the SPARK_WORKER_CORES in ${SPARK_HOME}/conf/spark_env.sh

I am wondering how I could use the -c argument as discussed in the tutorial.

2

2 Answers

3
votes

The issue I submitted (issues.apache.org/jira/browse/SPARK-11841) was answered. -c, -m and -d can be only passed to start-slave.sh. And you can only run those scripts on the machine that you want to be used as worker. The usage is:

start-slave.sh <masterURL> [options]

For example, to start a slave with 16 cores, we can use:

start-slave.sh spark://master_ip:7077 -c 16

2
votes

That argument is - as you have seen in the scripts - not supported AFAIK. The correct approach is to use a combination of the

spark-defaults.conf, 
spark-env.sh
command line settings e.g.  --executor_cores

to indicate the number of cores to use for a given task.