Setting Spark configuration through environment variable, command line arguments or code?

Question

I'm learning Spark these days, but I'm a little confused by Spark configurations. AFAIK, there are at least 3 ways to config:

Environment variables, http://spark.apache.org/docs/latest/spark-standalone.html
Command line arguments, like ./bin/spark-submit --class <main-class> --master xxx --deploy-mode xxx --conf key=value
Code, like in Scala/Java code.

Why are there so many ways to do it, what are the differences? Is there a best practice for this?

KrisP KrisP · Accepted Answer · 2015-12-29T01:29:02

To answer your question directly:

you use configurations in source code when you expect your important parameters never to change and not be hardware dependent - e.g. conf.set("spark.eventLog.enabled", "true") (although, arguably, you might leave that particular one out of source code - it could arguably go in the properties file, 3rd option here)
you use command-line options for parameters that change from run to run - e.g. driver-memory or executor-cores - you expect this to change depending which hardware you run it on (or while tuning) - so such a configuration shouldn't be in your source code
you use configurations in a properties file when configuration settings don't change often - e.g. if you always use the same hardware configuration to run your app, you might define spark.driver.memory in the properties file (a template is in the conf directory of your $SPARK_HOME)