1
votes

I'm struggling a bit on trying to use multiple (via include) Typesafe config files in my Spark Application that I am submitting to a YARN queue in cluster mode. I basically have two config files and file layouts are provided below:

  1. env-common.properties
  2. application-txn.conf (this file uses an "include" to reference the above one)

Both of the above files are external to my application.jar, so I pass them to yarn using the "--files" (can be seen below)

I am using the Typesafe config library to parse my "application-main.conf" and in this main conf, I am trying to use a property from the env.properties file via substitution, but the variable name does not get resolved :( and I'm not sure why.

env.properties

txn.hdfs.fs.home=hdfs://dev/1234/data

application-txn.conf:

# application-txn.conf
include required(file("env.properties"))

app {
  raw-data-location = "${txn.hdfs.fs.home}/input/txn-raw"
}

Spark Application Code:


//propFile in the below block maps to "application-txn.conf" from the app's main method

def main {
  val config = loadConfig("application-txn.conf")
  val spark = SparkSession.builkder.getOrCreate()

  //Code fails here:
  val inputDF = spark.read.parquet(config.getString("app.raw-data-location"))
}

def loadConf(propFile:String): Config = {
   ConfigFactory.load()
   val cnf = ConfigFactory.parseResources(propFile)
   cnf.resolve()
}

Spark Submit Code (called from a shell script):

spark-submit --class com.nic.cage.app.Transaction \
--master yarn \
--queue QUEUE_1 \
--deploy-mode cluster \
--name MyTestApp \
--files application-txn.conf,env.properties \
--jars #Typesafe config 1.3.3 and my app.jar go here \
--executor-memory 2g \
--executor-cores 2 \
app.jar application-txn.conf 

When I run the above, I am able to parse the config file, but my app fails on trying to read the files from HDFS because it cannot find a directory with the name: ${txn.hdfs.fs.home}/input/txn-raw

I believe that the config is actually able to read both files...or else it would fail because of the "required" keyword. I verified this by adding another include statement with a dummy file name, and the application failed on parsing of the config. Really not sure what's going on right now :(.

Any ideas what could be causing this resolution to fail? If it helps: When I run locally with multiple config files, the resolution works fine

1

1 Answers

1
votes

The syntax in application-txn.conf is wrong.

The variable should be outside the string, like so:

raw-data-location = ${txn.hdfs.fs.home}"/input/txn-raw"