0
votes

I recently started a Spark cluster on Google Cloud Dataproc using the 'preview' image. According to the documentation, the preview image's Spark version is '2.1.0', however running spark-shell --version reveals that the cluster is in fact running Spark 2.2.0. This is a problem for us, because our version of spark-avro is not compatible with Spark 2.2.0. Is anyone else experiencing this issue? I haven't been able to find any trace of an official announcement from Google regarding the version bump.

1

1 Answers

2
votes

Sorry about that, it appears the minor release notes for the recent preview image update got lost in the ether; the documentation should hopefully be updated by tomorrow. Indeed you're right that the current Dataproc preview version is now Spark 2.2.0. If you need to pin to a known working older preview image, you can try:

gcloud dataproc clusters create --image https://www.googleapis.com/compute/v1/projects/cloud-dataproc/global/images/dataproc-1-2-20170227-145329

That should contain Spark 2.1.0. That said, keep in mind that in general it's always possible that incompatible changes may be made in new preview images, and pinning to that older preview image isn't guaranteed to continue working long term.

In your case, do you happen to know whether you're hitting this issue filed on spark-avro or is it something specific to your version? Ideally we should get you updated to Spark 2.2, since an official (non-preview) image version is going to be imminent with Spark 2.2.