0
votes

I am attempting to run a spark application on aws emr in client mode. I have setup a bootstrap action to import needed files and the jar from s3, and I have a step to run a single spark job.

However when the step executes, the jar I have imported isn't found. Here is the stderr output:

19/12/01 13:42:05 WARN DependencyUtils: Local jar /mnt/var/lib/hadoop/steps/s-2HLX7KPZCA07B/~/myApplicationDirectory does not exist, skipping.

I am able to successfully import the jar and other needed files for the application from my s3 bucket to the master instance, I simply import them to home/ec2-user/myApplicationDirectory/myJar.jar via a bootstrap action.

However I don't understand why the step is looking for the jar at mnt/var/lib/hadoop/...etc.

here are the relevant parts of the cli configuration:

--steps '[{"Args":["spark-submit",
"--deploy-mode","client",
"--num-executors","1",
“--driver-java-options","-Xss4M",
"--conf","spark.driver.maxResultSize=20g",
"--class”,”myApplicationClass”,
“~/myApplicationDirectory”,
“myJar.jar",
…
   application specific arguments and paths to folders here 
…],
”Type":"CUSTOM_JAR",

thanks for any help,

1

1 Answers

0
votes

It looks like it doesn't understand the ~ as referring to the home directory. Try changing "~/myApplicationDirectory" to "/home/ec2-user/myApplicationDirectory".


A little warning: in the sample in your question, straight quotation marks " are mixed with "smart" ones . Make sure the "smart" quotation marks don't end up in your configuration file, or you will get very confusing error messages.