3
votes

I am running a small spark cluster, with two EC2 instances (m4.xlarge).

So far I have been running the spark master on one node, and a single spark slave (4 cores, 16g memory) on the other, then deploying my spark (streaming) app in client deploy-mode on the master. Summary of settings is:

--executor-memory 16g

--executor-cores 4

--driver-memory 8g

--driver-cores 2

--deploy-mode client

This results in a single executor on my single slave running with 4 cores and 16Gb memory. The driver runs "outside" of the cluster on the master-node (i.e. it is not allocated its resources by the master).

Ideally I'd like to use cluster deploy-mode so that I can take advantage of the supervise option. I have started a second slave on the master node giving it 2 cores and 8g memory (smaller allocated resources so as to leave space for the master daemon).

When I run my spark job in cluster deploy-mode (using the same settings as above but with --deploy-mode cluster). Around 50% of the time I get the desired deployment which is that the driver runs through the slave running on the master node (which has the right resources of 2 cores & 8Gb) which leaves the original slave node free to allocate an executor of 4 cores & 16Gb. However the other 50% of the time the master runs the driver on the non-master slave node, which means I get an driver on that node with 2 cores & 8Gb memory, which then leaves no node with sufficient resources to start an executor (which requires 4 cores & 16Gb).

Is there any way to force the spark master to use a specific worker / slave for my driver? Given spark knows that there are two slave nodes, one with 2 cores and the other with 4 cores, and that my driver needs 2 cores, and my executor needs 4 cores it would ideally work out the right optimal placement, but this doesn't seem to be the case.

Any ideas / suggestions gratefully received!

Thanks!

1
Any luck? I'm working on a similar problem and would also like to know if it's possible.PablodeAcero
I am also facing the same problem.desaiankitb
Found any solution to this...??Ravi Teja

1 Answers

1
votes

I can see that this is an old question, but let me answer it still, someone might find it useful.

Add --driver-java-options="-Dspark.driver.host=<HOST>" option to spark-submit script, when submitting application, and Spark should deploy driver to specified host.