Changing bucket class(Regional/Multi Regional) in Google Cloud Storage connector in Spark

Question

Currently I am running my Dataproc cluster in region europe. I am running spark application on same cluster. While writing to bucket using Google cloud storage connector in spark, buckets are automatically getting created with Multi-Regional class and with Multiple Regions in US properties.

I am writing file using

dataframe.write("gs://location").mode()...

This will create new bucket into location with properties mentioned above.

Tried to find configuration to set storage class in connector but no success. How we can resolve this.

I don't understand you question. You want to create bucket in Europe but spark create in US? So I want to know how to create in Europe ? — howie
Yes google cloud storage connector not providing any option to specify storage class and regions. — Sarang Shinde
You should create the bucket (in Europe Regine) first. Then use that bucket. — howie

howie howie · Accepted Answer · 2019-06-11T00:45:13

From document: Cloud Dataproc staging bucket

When you create a cluster, by default, Cloud Dataproc will create a Cloud Storage staging bucket in your project or reuse an existing Cloud Dataproc-created staging bucket from a previous cluster creation request. This bucket is used to stage cluster job dependencies, job driver output, and cluster config files. Instead of relying on the creation of a default staging bucket, you can specify an existing Cloud Storage bucket that Cloud Dataproc will use as your cluster's staging bucket.

If you create Dataproc cluster by command try to add --region=REGION

gcloud dataproc clusters create cluster-name --region region ...

Changing bucket class(Regional/Multi Regional) in Google Cloud Storage connector in Spark

2 Answers