0
votes

I have a cluster of 3 macOS machines running Hadoop and Spark-1.5.2 (though with Spark-2.0.0 the same problem exists). With 'yarn' as the Spark master URL, I am running into a strange issue where tasks are only allocated to 2 of the 3 machines.

Based on the Hadoop dashboard (port 8088 on the master) it is clear that all 3 nodes are part of the cluster. However, any Spark job I run only uses 2 executors.

For example here is the "Executors" tab on a lengthy run of the JavaWordCount example: enter image description here "batservers" is the master. There should be an additional slave, "batservers2", but it's just not there.

Why might this be?

Note that none of my YARN or Spark (or, for that matter, HDFS) configurations are unusual, except provisions for giving the YARN resource- and node-managers extra memory.

1

1 Answers

0
votes

Remarkably, all it took was a detailed look at the spark-submit help message to discover the answer:

YARN-only:

...

--num-executors NUM Number of executors to launch (Default: 2).

If I specify --num-executors 3 in my spark-submit command, the 3rd node is used.