I have researched this for a significant amount of time and find answers that seem to be for a slightly different question than mine.
UPDATE: Spark docs say the Driver runs on a cluster Worker in deployMode: cluster. This does not seem to be true when you don't use spark-submit
My Spark 2.3.3 cluster is running fine. I see the GUI on “http://master-address:8080", there are 2 idle workers, as configured.
I have a Scala application that creates a context and starts a Job. I do not use spark-submit, I start the Job programmatically and this is where many answers diverge from my question.
In "my-app" I create a new SparkConf, with the following code (slightly abbreviated):
conf.setAppName(“my-job")
conf.setMaster(“spark://master-address:7077”)
conf.set(“deployMode”, “cluster”)
// other settings like driver and executor memory requests
// the driver and executor memory requests are for all mem on the slaves, more than
// mem available on the launching machine with “my-app"
val jars = listJars(“/path/to/lib")
conf.setJars(jars)
…
When I launch the job I see 2 executors running on the 2 nodes/workers/slaves. The logs show their IP address and calls them executor 0 and 1.
With a Yarn cluster I would expect the “Driver" to run on/in the Yarn Master but I am using the Spark Standalone Master, where is the Driver part of the Job running? If it runs on a random worker or elsewhere, is there a way to find it from logs
Where is my Spark Driver executing? Does deployMode
= cluster
work when not using spark-submit? Evidence shows a cluster with one master (on the same machine as executor 0) and 2 Workers. It also show identical memory usage on both Workers during the job. From logs I know both Workers are running Executors. Where is the Driver?
The “Driver” creates and broadcasts some large data structures so the need for an answer is more critical than with more typical tiny Drivers.
Where is the driver running? How do I find it given logs and monitoring? I can't reconcile what I see with the docs, they contradict each other.