I'm trying to run a Beam pipeline (as part of an AI Platform ML training job) on GCP. I'm following the data processing part of this notebook very closely. When I set Beam to use the "DirectRunner", everything runs fine (It's slow though). But as soon as I switch to DataflowRunner, I get an "Insufficient scope" error: (Traceback shortened)
HttpForbiddenError: HttpError accessing https://dataflow.googleapis.com/v1b3/projects/neurec-252017/locations/us-central1/jobs?alt=json:
response: [...]
content <{ "error": { "code": 403, "message": "Request had insufficient authentication scopes.", "status": "PERMISSION_DENIED" } }
I have spent quite some time reading answers to similar questions here on SO to no avail. What I understand is that when I launch a Beam pipeline it creates a GCE instance on behalf of my project and that VM does not have the required permission to write to my GCS bucket.
What I cannot figure out is how to set the right scope/permissions for that GCE instance (preferably from within the Python code that launches the Beam pipeline, rather than the GCP console). I tried giving the following permissions to the Compute Engine default service account ([PROJECT_NUMBER][email protected]):
Compute Instance Admin (v1)
Dataflow Admin
Owner
Storage Admin
But I'm still getting the same error. Any help will be very much appreciated.