1
votes

I'm trying to run a Beam pipeline (as part of an AI Platform ML training job) on GCP. I'm following the data processing part of this notebook very closely. When I set Beam to use the "DirectRunner", everything runs fine (It's slow though). But as soon as I switch to DataflowRunner, I get an "Insufficient scope" error: (Traceback shortened)

HttpForbiddenError: HttpError accessing https://dataflow.googleapis.com/v1b3/projects/neurec-252017/locations/us-central1/jobs?alt=json:

response: [...]

content <{ "error": { "code": 403, "message": "Request had insufficient authentication scopes.", "status": "PERMISSION_DENIED" } }

I have spent quite some time reading answers to similar questions here on SO to no avail. What I understand is that when I launch a Beam pipeline it creates a GCE instance on behalf of my project and that VM does not have the required permission to write to my GCS bucket.

What I cannot figure out is how to set the right scope/permissions for that GCE instance (preferably from within the Python code that launches the Beam pipeline, rather than the GCP console). I tried giving the following permissions to the Compute Engine default service account ([PROJECT_NUMBER][email protected]):

Compute Instance Admin (v1)

Dataflow Admin

Owner

Storage Admin

But I'm still getting the same error. Any help will be very much appreciated.

1
hi Milad, have you been able to resolve this issue since then? I'm running into exactly the same one.Paul Milovanov

1 Answers

0
votes

You need the role "Dataflow Admin" which is the minimum role for creating and managing Dataflow jobs.

Assign this role to your account (the one launching the Dataflow runner job) and not the Compute Engine Default Service Account. Return the Compute Engine Default Service Account back the way it was (remove your changes).

Cloud Dataflow Access Control Guide