is there a mapping/translation for the number of hardware systems, cpu cores and their associated memory to the spark-submit tunables of: executor-memory executor-cores num-executors The application is certaionly bound to have something to do with these tunables, I am however looking for a "basic rule of thumb" Apache spark is running on yarn with hdfs in cluster mode. Not all the hardware systems in the spark/hadoop yarn cluster have the same number of cpu cores or RAM.
1
votes
I think the general idea is to oversubscribe to resources on the cluster and let the Spark driver determine the best configuration.
– Saif Charaniya
How about a mapping of the spark tunables with respect to cpu cores and RAM?
– user5191140
I don't think there's much of a mapping available, but there are preferred hardware provisions: spark.apache.org/docs/latest/hardware-provisioning.html
– Saif Charaniya
1 Answers
0
votes
There is no thumb rule, but after considering
- off heap memory
- Number of applications and other hadoop dameons running
- Resource manager needs
- HDFS IO
etc.
You can derive a suitable configuration. Please check this url