I'm struggling a bit on trying to use multiple (via include) Typesafe config files in my Spark Application that I am submitting to a YARN queue in cluster mode. I basically have two config files and file layouts are provided below:
- env-common.properties
- application-txn.conf (this file uses an "include" to reference the above one)
Both of the above files are external to my application.jar, so I pass them to yarn using the "--files" (can be seen below)
I am using the Typesafe config library to parse my "application-main.conf" and in this main conf, I am trying to use a property from the env.properties file via substitution, but the variable name does not get resolved :( and I'm not sure why.
env.properties
txn.hdfs.fs.home=hdfs://dev/1234/data
application-txn.conf:
# application-txn.conf
include required(file("env.properties"))
app {
raw-data-location = "${txn.hdfs.fs.home}/input/txn-raw"
}
Spark Application Code:
//propFile in the below block maps to "application-txn.conf" from the app's main method
def main {
val config = loadConfig("application-txn.conf")
val spark = SparkSession.builkder.getOrCreate()
//Code fails here:
val inputDF = spark.read.parquet(config.getString("app.raw-data-location"))
}
def loadConf(propFile:String): Config = {
ConfigFactory.load()
val cnf = ConfigFactory.parseResources(propFile)
cnf.resolve()
}
Spark Submit Code (called from a shell script):
spark-submit --class com.nic.cage.app.Transaction \
--master yarn \
--queue QUEUE_1 \
--deploy-mode cluster \
--name MyTestApp \
--files application-txn.conf,env.properties \
--jars #Typesafe config 1.3.3 and my app.jar go here \
--executor-memory 2g \
--executor-cores 2 \
app.jar application-txn.conf
When I run the above, I am able to parse the config file, but my app fails on trying to read the files from HDFS because it cannot find a directory with the name: ${txn.hdfs.fs.home}/input/txn-raw
I believe that the config is actually able to read both files...or else it would fail because of the "required" keyword. I verified this by adding another include statement with a dummy file name, and the application failed on parsing of the config. Really not sure what's going on right now :(.
Any ideas what could be causing this resolution to fail? If it helps: When I run locally with multiple config files, the resolution works fine