Unable to use Dataproc cluster for multi user with GCS with restricted access to users

Question

@dennis-huo

Using non-default service account in Google Cloud dataproc

In continuation to the above problem

I wanted to setup a dataproc cluster for multi user. Since the compute engine of Dataproc cluster uses a default service or custom service account credentials to connect to storage bucket using --properties core:fs.gs.auth.service.account.json.keyfile which doesn't have any relation with user principals who submits the jobs or I couldn't find an option to control it, which makes the dataproc cluster insecure and creates a problem it introduces another level of indirection in multi-user environment, when key file used does not correspond to principal.

In my case we are submitting the job using gcloud dataproc jobs submit hadoop because my thought is to controls access to dataproc cluster using IAM roles but during the job submission the user principals are not getting forward to the hadoop cluster and as well as gcloud cli doesn't perform any access validation on storage buckets at client side , the job always executed as root user. May I know how to map the users to their service account do you have any solution for this case?

All we need is the Hadoop Map Reduce submitted by users using gcloud dataproc jobs submit hadoop should be able to use only the storage buckets or folder which user has access to it.

Current:

gcloud dataproc jobs (IAM - user principal) -> Dataproc Cluster (IAM - user principal) -> (SA Default/custom) -> Storage Bucket

If the user has access to submit jobs to Dataproc cluster can use any storage-buckets which the service account has access to it.

Required:

gcloud dataproc jobs (IAM - user principal) -> Dataproc Cluster (IAM - user principal) -> (IAM - user principal) -> Storage Bucket

The user has access to submit jobs to Dataproc cluster can only use the storage-buckets which the user account has access to it.

So far I couldn't find a way to do it. Can you please help me on it

Is there any workaround or solution available to this problem?

howie howie · Accepted Answer · 2019-02-06T22:32:26

You may try this:

Add custom role , for exmple create roleA for BucketA / roleB for BucketB
Assign Service Account or IAM to this Role. for exmple user1,user2 roleA user1,user3 roleB
By Edit bucket permission, add member to specific role , for example bucketA -> roleA

Then The user has access to submit jobs to Dataproc cluster can only use the storage-buckets which the user account has access to it.

Unable to use Dataproc cluster for multi user with GCS with restricted access to users

1 Answers