I am confused about dealing with executor memory and driver memory in Spark.
My environment settings are as below:
- Memory 128 G, 16 CPU for 9 VM
- Centos
- Hadoop 2.5.0-cdh5.2.0
- Spark 1.1.0
Input data information:
- 3.5 GB data file from HDFS
For simple development, I executed my Python code in standalone cluster mode (8 workers, 20 cores, 45.3 G memory) with spark-submit. Now I would like to set executor memory or driver memory for performance tuning.
From the Spark documentation, the definition for executor memory is
Amount of memory to use per executor process, in the same format as JVM memory strings (e.g. 512m, 2g).
How about driver memory?