0
votes

I'm bit new to Spark and trying to understand few term. (Couldn't understand using online resources)

Please validate me first with below terms:

Executor: Its container or JVM process which will be running on worker node or data node. We can have multiple Executors per node.

Core: Its a thread within a container or JVM process running on worker node or data node. We can have multiple cores or threads per executor.

Please correct me If am wrong in above two concepts.

Questions:

  1. When ever we submit spark job, What does it means ? Are we handing over our job to Yarn or resource manager which will assigning resources to my application or job in cluster and execute that ? Its it correct understanding .. ?
  2. In command used to submit job in spark cluster, there is an option to set number of executors.

    spark-submit --class <CLASS_NAME> --num-executors ? --executor-cores ? --executor-memory ? ....

So these number of executors + cores will be setting up per-node? If not then how can we set specific number of cores per node?

1

1 Answers

0
votes

All of your assumptions are correct. For in a detailed explanation regarding cluster architecture please go through this link. You'll get a clear picture. Regarding your second question, num-of-executors is for the entire cluster. It is calculated as below:

num-cores-per-node * total-nodes-in-cluster

For example, suppose that you have a 20-node cluster with 4-core machines, and you submit an application with -executor-memory 1G and --total-executor-cores 8. Then Spark will launch eight executors, each with 1 GB of RAM, on different machines. Spark does this by default to give applications a chance to achieve data locality for distributed filesystems running on the same machines (e.g., HDFS) because these systems typically have data spread out across all nodes.

I hope it helps!