@dennis-huo
Using non-default service account in Google Cloud dataproc
In continuation to the above problem
I wanted to setup a dataproc cluster for multi user. Since the compute engine of Dataproc cluster uses a default service or custom service account credentials to connect to storage bucket using --properties core:fs.gs.auth.service.account.json.keyfile which doesn't have any relation with user principals who submits the jobs or I couldn't find an option to control it, which makes the dataproc cluster insecure and creates a problem it introduces another level of indirection in multi-user environment, when key file used does not correspond to principal.
In my case we are submitting the job using gcloud dataproc jobs submit hadoop because my thought is to controls access to dataproc cluster using IAM roles but during the job submission the user principals are not getting forward to the hadoop cluster and as well as gcloud cli doesn't perform any access validation on storage buckets at client side , the job always executed as root user. May I know how to map the users to their service account do you have any solution for this case?
All we need is the Hadoop Map Reduce submitted by users using gcloud dataproc jobs submit hadoop should be able to use only the storage buckets or folder which user has access to it.
Current:
gcloud dataproc jobs (IAM - user principal) -> Dataproc Cluster (IAM - user principal) -> (SA Default/custom) -> Storage Bucket
If the user has access to submit jobs to Dataproc cluster can use any storage-buckets which the service account has access to it.
Required:
gcloud dataproc jobs (IAM - user principal) -> Dataproc Cluster (IAM - user principal) -> (IAM - user principal) -> Storage Bucket
The user has access to submit jobs to Dataproc cluster can only use the storage-buckets which the user account has access to it.
So far I couldn't find a way to do it. Can you please help me on it
Is there any workaround or solution available to this problem?