Spark client reconnect to YARN cluster

Question

From the official spark documentation (http://spark.apache.org/docs/1.2.0/running-on-yarn.html):

In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.

Is there a way that a client reconnects back to the driver at some point later to collect the results?

Francois G Francois G · Accepted Answer · 2015-01-15T11:57:25

No simple way that I know of.

Broadly, yarn-cluster mode makes sense for production jobs, while yarn-client mode makes sense for interactive and debugging uses where you want to see your application’s output immediately.

In a production job, the simplest is perhaps to have your driver ship the results somewhere once it has them (e.g. write them to HDFS, logging ...).

Spark client reconnect to YARN cluster

2 Answers