1
votes

I deployed a PySpark ML model into a Google Cloud Dataproc cluster and it was running for over an hour, but my data is about 800 MB.

Is there anything being needed to declare as master on my SparkSession? I set the default option 'local'.

1

1 Answers

1
votes

When you pass a local deploy mode option to SparkContext it executes your application locally on a single VM, to avoid this you should not pass any options in the SparkContext constructor - it will use pre-configured properties by Dataproc and run your application on YARN utilizing all cluster resources/nodes.