1
votes

I setup a Hortonworks Hadoop cluster:

  • Hortonworks version is 2.3.2.
  • 1 NameNode, 1 Secondary NameNode, 10 DataNode
  • Spark 1.4.1 and deployed on all data node.
  • YARN is installed.

When I run a spark program, the executor is only run on 4 nodes but not whole data nodes.

How to estimate amount of spark executor on such Hadoop cluster?

1

1 Answers

0
votes

The amount of executors you request is by default 4. If you want to request more, you have to call with the --num-executors = x parameter on the command line or set spark.executors.instances in the configuration. More details here: https://spark.apache.org/docs/latest/running-on-yarn.html

Because the Spark is run on Hortonworks Hadoop with YARN, each Spark client should deploy YARN/node manager, YARN client. Otherwise, the spark client would not be scheduled.

The actual executor is related on the min number of node manager and num-executors.