0
votes

I am trying to send data from Google Cloud Pub/Sub, via a DataFlow, to a Google Cloud Storage bucket. I used the template to create the DataFlow.

When I set up the bucket with default permissions, I get a warning in the Job Logs, "somelong#[email protected] does not have storage.objects.get access...", and no data shows up in the bucket.

I added Dataflow Admin permissions to the Viewers of project member, and the warning went away.

It seems that the process is writing to the bucket, most just viewing it, so I'm a) confused as to why this solved my problem, b) unsure if this is the correct/appropriate permission to use.

Any info would be appreciated.

1

1 Answers

1
votes

somelong#[email protected]

The "somelong#" is the Project Number for your project.

I added Dataflow Admin permissions to the Viewers of project member, and the warning went away.

The role roles/viewer has permission to list buckets, but not access objects in a bucket.

I am not sure exactly what you did and where you did it based upon the provided information. If you added the role roles/dataflow.admin to each user or the group that the users below to that is OK. The Dataflow Admin is the correct role if users need to create Dataflow jobs and access the resultant data. However, what is missing is where these jobs are being launched from. Probably from a Compute instance and this is why the service account is listed in the error. The service account also needs permissions. As they say, the answer is in the details and your question is missing a few.

If the Dataflow job is being launched from a Compute Engine instance, then the Compute Engine Default service account (the same one from your error message) needs Dataflow and Cloud Storage permissions. Dataflow.admin will give the service account the required permissions.

If the Dataflow job is being launched from outside the cloud (someone's desktop) then that user's IAM member account needs the permissions.

To see the permissions that a role has, you can use the CLI:

gcloud iam roles describe roles/dataflow.admin

This returns the following information. The import items are the list of permissions:

description: Minimal role for creating and managing dataflow jobs.
etag: AA==
includedPermissions:
- compute.machineTypes.get
- dataflow.jobs.cancel
- dataflow.jobs.create
- dataflow.jobs.get
- dataflow.jobs.list
- dataflow.jobs.updateContents
- dataflow.messages.list
- dataflow.metrics.get
- resourcemanager.projects.get
- resourcemanager.projects.list
- storage.buckets.get
- storage.objects.create
- storage.objects.get
- storage.objects.list
name: roles/dataflow.admin
stage: GA
title: Dataflow Admin

From this list, you can see that you gave someone/something permissions to create, get and list objects in your buckets. If the requirement is to provide just the storage permissions, then add the role roles\storage.legacyBucketWriter.