I came across a scenario when I supply spark.yarn.stagingDir
to spark-submit it starts failing and it doesn't give any clue about the rootcause, and I spent quite long time to figure out it's because of spark.yarn.stagingDir
parameter. Why spark-submit fails when supply spark.yarn.stagingDir
this parameter?
Check related question here for more details
Command which fails:
spark-submit \
--conf "spark.yarn.stagingDir=/xyz/warehouse/spark" \
--queue xyz \
--class com.xyz.TestJob \
--master yarn \
--deploy-mode cluster \
--conf "spark.local.dir=/xyz/warehouse/tmp" \
/xyzpath/java-test-1.0-SNAPSHOT.jar
When I remove spark.yarn.stagingDir
, it starts working:
spark-submit \
--queue xyz \
--class com.xyz.TestJob \
--master yarn \
--deploy-mode cluster \
--conf "spark.local.dir=/xyz/warehouse/tmp" \
/xyzpath/java-test-1.0-SNAPSHOT.jar
Exception stacktrace:
Application application_1506717704791_145448 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)