We have a Beam/Dataflow pipeline (using Dataflow SDK 2.0.0-beta3 & running on GCP) that uses the template functionality. Whenever we run it, it always spits out the following warning:
11:05:30,484 0 [main] INFO org.apache.beam.sdk.util.DefaultBucket - No staging location provided, attempting to use default bucket: dataflow-staging-us-central1-435085767562
11:05:31,930 1446 [main] WARN org.apache.beam.sdk.util.RetryHttpRequestInitializer - Request failed with code 409, will NOT retry: https://www.googleapis.com/storage/v1/b?predefinedAcl=projectPrivate&predefinedDefaultObjectAcl=projectPrivate&project=<redacted>"
However, we are setting the --stagingLocation parameter, and we can see all the binaries/jars etc. are uploaded to the bucket that we've specified in the --stagingLocation parameter.
However, Beam/Dataflow then creates the following zombie bucket in GCS in our project: dataflow-staging-us-central1-435085767562
Why is this happening if we are clearly setting the --stagingLocation parameter?