I am a beginner in cloud and would like to limit my dataproc cluster
access to a given gcs buckets
in my project.
Lets says I have created a service account
named as 'data-proc-service-account@my-cloud-project.iam.gserviceaccount.com'
and then I create a dataproc cluster and assign service account to it.
Now I have created two gcs bucket named as
'gs://my-test-bucket/spark-input-files/'
'gs://my-test-bucket/spark-output-files/'
These buckets holds some input files which needs to be accessed by spark jobs running on my dataproc cluster and also act as a location wherein my spark jobs can write some output files.
I think I have to go and edit my bucket permission as shown in given link. Edit Bucket Permission
I want that my spark jobs can only read files from this specific bucket 'gs://my-test-bucket/spark-input-files/'
.
and if they are writing to a gcs bucket, they can only write to ''gs://my-test-bucket/spark-output-files/'
Question here is: (most likely a question related to SRE resource)
What all IAM permission needs to be added to my data proc service account
data-proc-service-account@my-cloud-project.iam.gserviceaccount.com
on IAM
console page.
and what all read/write permissions needs to be added for given specific buckets, Which I believe has to be configured via adding member and assigning right permission to it. (as shown in the link mentioned above)
Do I need to add my data proc service account as a member and can add below these two roles. will this work?
Storage Object Creator for bucket 'gs://my-test-bucket/spark-output-files/
Storage Object Viewer for bucket 'gs://my-test-bucket/spark-input-files/'
Also let me know in case I have missed anything or something better can be done.