3
votes

I'm writing a spark application, and using sbt assembly to create a fat jar, which I can send to spark-submit (through Amazon EMR). My application uses typesafe-config, with a reference.conf file inside my resources directory. My jar is on Amazon S3, and I use the command aws emr add-steps.. to create a new spark job (which downloads the jar to the cluster and sends it to spark-submit). I know that in general, I can use application.conf to override the settings. However, since I'm using spark (and a fat jar), I need some way to deploy my override.

What is the recommended way of overriding the application config settings when using spark?

2

2 Answers

4
votes

You can use spark-submit... --conf my.app.config.value=50 --conf config.file=other.conf ... fat.jar

When using typesafe.config.ConfigFactory.load(), values specified on the command line will override theses specified in 'other.conf' which in turn override thoses specified in 'reference.conf' in your fatjar.

0
votes

In my spark java code I write like this to override application config

    SparkConf sparkConf = new SparkConf();
    sparkConf.setMaster(sparkMaster);
    sparkConf.set("spark.executor.memory", "1024M");

    sparkConf.set("spark.default.parallelism", "48");
    sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
    JavaSparkContext ctx = new JavaSparkContext(sparkConf);