failing to connect to spark driver when submitting job to spark in yarn mode

Question

When I submit a spark job to the cluster it failed with the following exeption in the shell:

> Exception in thread "main" org.apache.spark.SparkException:
> Application application_1497125798633_0065 finished with failed status
>         at org.apache.spark.deploy.yarn.Client.run(Client.scala:1244)
>         at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1290)
>         at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/06/29 10:25:36 INFO ShutdownHookManager: Shutdown hook called

This is what it gives in Yarn logs:

> Caused by: java.io.IOException: Failed to connect to /0.0.0.0:35994 at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
> at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:194) at
> org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:190) at
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)

Which I guess means it failed to connect to the driver. I tried to increase "spark.yarn.executor.memoryOverhead" parameter but that didn't work.

This is the submit command I use:

/bin/spark-submit \
  --class example.Hello \
  --jars ... \
  --master yarn \
  --deploy-mode cluster \
  --supervise \
  --conf spark.yarn.driver.memoryOverhead=1024 ...(jar file path)

I am using HDP-2.6.1.0 and spark 2.1.1

Can you remove --supervise and start over? Can you paste the entire output from spark-shell? Can you paste the logs from YARN? Use yarn logs -applicationId. — Jacek Laskowski
I removed it from the command and nothing changed. I updated my question to have the shell error as well as the exception in yarn logs — tariq abughofa
Can you paste the other lines above the exception? I think I've seen a similar exception and it was at shutdown. Show more logs. Thanks. — Jacek Laskowski
thank you for replying @JacekLaskowski but I found the problem — tariq abughofa

tariq abughofa tariq abughofa · Accepted Answer · 2017-07-01T17:16:29

Running Spark in Yarn mode (which what I was doing) is the right to use spark in HDP as stated here: https://community.hortonworks.com/questions/52591/standalone-spark-using-ambari.html

which means I should not specify a master or use the start-master / start-slave commands.

The problem was that the driver IP was taken as 0.0.0.0 for some reason and all the cluster nodes were trying to contact the driver using the local interface and thus fail. I fixed this by setting the following configuration in conf/spark-defaults.conf:

spark.driver.port=20002

spark.driver.host=HOST_NAME

and by changing the deploy-mode to client to make it deply the driver locally.

failing to connect to spark driver when submitting job to spark in yarn mode

2 Answers