2
votes

I submit jobs to a spark cluster in Dataproc (with Hadoop Yarn). and I see that no matter the properties that I set for spark.master and deployment-mode, when I go the Spark UI, in the Environment tab of the job it is always shown local for spark.master, and the different stages of the job always uses the same executor id, even when there is room for more.

e.g.:

gcloud dataproc jobs submit spark --cluster mycluster --regionmyregion --class MyApp --properties 'spark.executor.extraJavaOptions=-verbose:class,,spark.master=yarn,spark.deploy-mode=cluster,spark.submit.deployMode=client,spark.executor.instances=2,spark.scheduler.mode=FIFO,spark.executor.cores=2,spark.dynamicAllocation.minExecutors=2'
1
You don't need to specify these properties, by default Dataproc Spark uses YARN, client mode and dynamic allocation.Dagang
Thanks for the response. However the job environment shows "local"? Is that meaningless @Dagang?vejeta
How did you open the Spark UI?Dagang
I set a private ssh tunnel as described here: cloud.google.com/community/tutorials/ssh-tunnel-on-gce or here: cloud.google.com/dataproc/docs/concepts/accessing/… Then, through the browser I can access the Spark UI, and the Spark History Server: http://mymasternode:4040 http://mymasternode:10200vejeta
I had set .set("spark.master", "local") in the code, and that had precedence over the properties sent while submitting.vejeta

1 Answers

2
votes

I had set .set("spark.master", "local") in the code, and that had precedence over the properties sent while submitting.