
I'm writing a spark application, and using sbt assembly to create a fat jar, which I can send to spark-submit (through Amazon EMR). My application uses typesafe-config, with a reference.conf file inside my resources directory. My jar is on Amazon S3, and I use the command aws emr add-steps.. to create a new spark job (which downloads the jar to the cluster and sends it to spark-submit). I know that in general, I can use application.conf to override the settings. However, since I'm using spark (and a fat jar), I need some way to deploy my override.

What is the recommended way of overriding the application config settings when using spark?


2 Answers


You can use spark-submit... --conf my.app.config.value=50 --conf config.file=other.conf ... fat.jar

When using typesafe.config.ConfigFactory.load(), values specified on the command line will override theses specified in 'other.conf' which in turn override thoses specified in 'reference.conf' in your fatjar.


In my spark java code I write like this to override application config

    SparkConf sparkConf = new SparkConf();
    sparkConf.set("spark.executor.memory", "1024M");

    sparkConf.set("spark.default.parallelism", "48");
    sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
    JavaSparkContext ctx = new JavaSparkContext(sparkConf);