I've defined a default configuration in my Spark application which is tucked in src/main/resources/reference.conf
. I use ConfigFactory.load()
to obtain the configuration.
When I run the application with spark-submit
it picks up these defaults. However, when I only want to override a few of the configurations available in reference.conf
and provide application.conf
, it does not seem to pick up these overrides. From the documentation I thought that application.conf
is merged with reference.conf
when calling load()
, so that it's not necessary to re-define everything in application.conf
.
My reference.conf
looks like this:
hdfs {
rootDir: "/foo"
dataDir: "hdfs://"${hdfs.rootDir}"/bar"
}
db {
driver: "com.mysql.jdbc.Driver"
...
}
...
What I'd now like to do is have an application.conf
with, say, only a custom hdfs
section because the rest is the same.
I run my Spark app by supplying application.conf
in both the --files
parameter, in --driver-class-path
, and --conf spark.executor.extraClassPath
. This may be overkill but it works when I create a copy of reference.conf
and change a few of the fields.
What am I missing?
conf
? That one is colon-separated.files
is comma-separated. – Ian--driver-class-path
isn't the same forspark.exector.extraClassPath
, meaning if you set for example--driver-class-path "/opt/bla/application.conf"
, the equivalent for the executor need only be--conf "spark.executor.extraClassPath=application.conf"
, since--file
will dump it in the working directory where the executor launches the uber JAR. – Yuval Itzchakovfiles
parameter. Got it. Would this be OK then:spark.executor.extraClassPath=/usr/lib/foo/bar.jar:/foo/bar/application.conf
? – Ian--files
parameter in order to sendapplication.conf
to the worker nodes. The path needs to be:spark.executor.extraClassPath=bar.jar:application.conf
– Yuval Itzchakov