
I'm trying to increase memory allocation for my executors and drivers in Spark, but I have the strange feeling that Spark is ignoring my configurations.

I'm using the following commands:

spark-submit spark_consumer.py --driver-memory=10G --executor-memory=5G --conf spark.executor.extraJavaOptions='-XX:+UseParallelGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps'

My initialization code is

class SparkRawConsumer:

    def __init__(self, filename):
        self.sparkContext = SparkContext.getOrCreate()

        self.sqlContext = SQLContext(self.sparkContext)

Theoretically, I should see that my driver program has total available 10GB of memory. However, I see this in my Spark UI (where my memory available is less than 400MB): enter image description here

Why is Spark ignoring the configurations I am passing in?


2 Answers


There are 3 differents way to define spark configuration

1) spark-env.sh

2) spark-submit parameter

3) hard coding sparkConf, exemple : sparkConf.set("spark.driver.memory","10G");

the priority are : hard coding > spark-submit > spark.env ;

if you think your parameter are overwrite by something else you can check it with : sparkConf.getOption("spark.driver.memory");

if you want to be sure that your options are not overwrite hard code it.

you can see all options here : https://spark.apache.org/docs/latest/configuration.html


The issue here was that I had specified the ordering of parameters incorrectly. Typing spark-submit --help clearly specifies an ordering for the input parameters to spark-submit:

Usage: spark-submit [options] <app jar | python file | R file> [app arguments]

Once I changed the ordering of the parameters, I was able to increase memory on my PySpark app:

spark-submit --driver-memory 8G --executor-memory 8G spark_consumer.py