1
votes

From the official spark documentation (http://spark.apache.org/docs/1.2.0/running-on-yarn.html):

In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application.

Is there a way that a client reconnects back to the driver at some point later to collect the results?

2

2 Answers

0
votes

No simple way that I know of.

Broadly, yarn-cluster mode makes sense for production jobs, while yarn-client mode makes sense for interactive and debugging uses where you want to see your application’s output immediately.

In a production job, the simplest is perhaps to have your driver ship the results somewhere once it has them (e.g. write them to HDFS, logging ...).

0
votes

Usually you could check the logs with

yarn logs -applicationId <app ID>

Check https://spark.apache.org/docs/2.2.0/running-on-yarn.html

If log aggregation is turned on (with the yarn.log-aggregation-enable config), container logs are copied to HDFS and deleted on the local machine. These logs can be viewed from anywhere on the cluster with the yarn logs command.

yarn logs -applicationId <app ID>

will print out the contents of all log files from all containers from the given application