2
votes

I have researched this for a significant amount of time and find answers that seem to be for a slightly different question than mine.

UPDATE: Spark docs say the Driver runs on a cluster Worker in deployMode: cluster. This does not seem to be true when you don't use spark-submit

My Spark 2.3.3 cluster is running fine. I see the GUI on “http://master-address:8080", there are 2 idle workers, as configured.

I have a Scala application that creates a context and starts a Job. I do not use spark-submit, I start the Job programmatically and this is where many answers diverge from my question.

In "my-app" I create a new SparkConf, with the following code (slightly abbreviated):

  conf.setAppName(“my-job")
  conf.setMaster(“spark://master-address:7077”)
  conf.set(“deployMode”, “cluster”)
  // other settings like driver and executor memory requests
  // the driver and executor memory requests are for all mem on the slaves, more than 
  // mem available on the launching machine with “my-app"
  val jars = listJars(“/path/to/lib")
  conf.setJars(jars)
  …

When I launch the job I see 2 executors running on the 2 nodes/workers/slaves. The logs show their IP address and calls them executor 0 and 1.

With a Yarn cluster I would expect the “Driver" to run on/in the Yarn Master but I am using the Spark Standalone Master, where is the Driver part of the Job running? If it runs on a random worker or elsewhere, is there a way to find it from logs

Where is my Spark Driver executing? Does deployMode = cluster work when not using spark-submit? Evidence shows a cluster with one master (on the same machine as executor 0) and 2 Workers. It also show identical memory usage on both Workers during the job. From logs I know both Workers are running Executors. Where is the Driver?

The “Driver” creates and broadcasts some large data structures so the need for an answer is more critical than with more typical tiny Drivers.

Where is the driver running? How do I find it given logs and monitoring? I can't reconcile what I see with the docs, they contradict each other.

1

1 Answers

1
votes

This is answered by the official documentation:

In cluster mode, however, the driver is launched from one of the Worker processes inside the cluster, and the client process exits as soon as it fulfills its responsibility of submitting the application without waiting for the application to finish.

In other words driver uses arbitrary worker node, hence it it is likely to co-locate with one on the executors, on such small cluster. And to anticipate the follow-up question - this behavior is not configurable - you just have to make sure that the cluster has capacity to start both required executors, and the driver with it's requested memory and cores.