Trying to make my question as broad as possible:
When writing an unbounded PCollection to a GCS bucket using TextIO, whilst using a service account with the principle of least privilege that does not have GCS deletion access the follow error occurs in dataflow:
Error trying to copy gs://[Temporary beam file] to gs://[JSON We expect]: {"code":403,"errors":[{"domain":"global","message":"[Service Account] does not have storage.objects.delete access to [JSONFile]","reason":"forbidden"}],"message":"[Service Account] does not have storage.objects.delete access to [JSON File]"}
The above error makes sense, considering that we are not allowing the service account to have deletion access to the bucket we are using, and there are shards of files that the dataflow pipeline is attempting to clean up.
The question however is, Is the best practice at this point to provide deletion access to the dataflow service account and keep using TextIO? or would it be better to use a DoFn on the PCollection we would like to ingest and use a DoFn to write each individual element into the GCS bucket incrementally using the GCS API? thus subverting the issue of the cleanup of the shards.