How to estimate amount of spark executor on a Hortonworks Hadoop cluster?

Question

I setup a Hortonworks Hadoop cluster:

Hortonworks version is 2.3.2.
1 NameNode, 1 Secondary NameNode, 10 DataNode
Spark 1.4.1 and deployed on all data node.
YARN is installed.

When I run a spark program, the executor is only run on 4 nodes but not whole data nodes.

How to estimate amount of spark executor on such Hadoop cluster?

Francois G Francois G · Accepted Answer · 2015-12-03T10:12:34

The amount of executors you request is by default 4. If you want to request more, you have to call with the --num-executors = x parameter on the command line or set spark.executors.instances in the configuration. More details here: https://spark.apache.org/docs/latest/running-on-yarn.html

Because the Spark is run on Hortonworks Hadoop with YARN, each Spark client should deploy YARN/node manager, YARN client. Otherwise, the spark client would not be scheduled.

The actual executor is related on the min number of node manager and num-executors.

How to estimate amount of spark executor on a Hortonworks Hadoop cluster?

1 Answers