override config settings when using a fat jar for spark on EMR

Question

I'm writing a spark application, and using sbt assembly to create a fat jar, which I can send to spark-submit (through Amazon EMR). My application uses typesafe-config, with a reference.conf file inside my resources directory. My jar is on Amazon S3, and I use the command aws emr add-steps.. to create a new spark job (which downloads the jar to the cluster and sends it to spark-submit). I know that in general, I can use application.conf to override the settings. However, since I'm using spark (and a fat jar), I need some way to deploy my override.

What is the recommended way of overriding the application config settings when using spark?

Michel Lemay Michel Lemay · Accepted Answer · 2015-10-15T17:53:11

You can use spark-submit... --conf my.app.config.value=50 --conf config.file=other.conf ... fat.jar

When using typesafe.config.ConfigFactory.load(), values specified on the command line will override theses specified in 'other.conf' which in turn override thoses specified in 'reference.conf' in your fatjar.

override config settings when using a fat jar for spark on EMR

2 Answers