1
votes

I have a Dataproc cluster v-1.2 which currently has Spark version 2.2.0, but our program currently fails and the fix has been introduced in Spark version 2.2.1 and 2.3.0. Is there a way in which we can upgrade Spark version without impacting or breaking any of the dependencies in the current cluster.

2
Neither Spark 2.2.1 nor 2.3.0 have been officially released yet; 2.2.1 is just about to cut a release candidate - what JIRA are you specifically interested in? - Dennis Huo
While the Spark release process is in-progress, you could try downgrading to Dataproc 1.1 which is on Spark 2.0.* and wouldn't be affected by the bug. - Dennis Huo
@DennisHuo any dates on when Spark 2.3.0 would be supported on Google Cloud DataProc. I see 2.2.0 in preview as well. Thank you. - Pramod Sripada

2 Answers

0
votes

FYI, Spark 2.3 is available in Dataproc 1.3: https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions.

gcloud dataproc clusters create <clustername> --image-version=1.3

0
votes

You can upgrade spark to the newer version 2.3 but there are some inbuilt functionalities you cannot use after the upgrade like you cannot directly open file from Google Cloud Storage.

Here is the link you can check the release date of all versions

They released 2.3 version but I haven't checked yet.

I hope they changed the default version. because I want to use pandas_udf in pyspark.