Spark job on DC/OS cluster on AWS

Question

I am trying to run a batch process in Spark on DC/OS on AWS. For each batch process, I have some specific parameters I send when I do spark submit (for example for which users to perform the batch process).

I have a Spark cluster on DC/OS, with one master and 3 private nodes.

I have created a application.conf file and uploaded it to S3, and enabled the permissions for accessing that file.

My spark submit command looks like this:

dcos spark run --submit-args='-Dspark.mesos.coarse=true --driver-class-path https://path_to_the_folder_root_where_is_the_file --conf spark.driver.extraJavaOptions=-Dconfig.file=application.conf --conf spark.executor.extraJavaOptions=-Dconfig.file=application.conf --class class_name jar_location_on_S3'

And I get the error that job.properties file is not found:

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'wattio-batch' at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164) at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:218) at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:224) at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:33) at com.enerbyte.spark.jobs.wattiobatch.WattioBatchJob$.main(WattioBatchJob.scala:31) at com.enerbyte.spark.jobs.wattiobatch.WattioBatchJob.main(WattioBatchJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:786) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:183) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:208) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:123) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

How to set this properly? Although one of the private slaves executes the driver, does it have the access to Internet(is it able to go to S3 and download conf file)?

Thank you

Srdjan Nikitovic Srdjan Nikitovic · Accepted Answer · 2016-05-19T18:00:18

I didn't succeed to send conf file from spark submit command, but what I did is to hard-code the location of application.conf file at the beginning of my program using:

System.setProperty("config.url", "https://s3_location/application.conf") ConfigFactory.invalidateCaches()

This way, program was able to read the application.conf file every time at launching.

Spark job on DC/OS cluster on AWS

1 Answers