1
votes

We have a Beam/Dataflow pipeline (using Dataflow SDK 2.0.0-beta3 & running on GCP) that uses the template functionality. Whenever we run it, it always spits out the following warning:

11:05:30,484 0    [main] INFO  org.apache.beam.sdk.util.DefaultBucket - No staging location provided, attempting to use default bucket: dataflow-staging-us-central1-435085767562
11:05:31,930 1446 [main] WARN  org.apache.beam.sdk.util.RetryHttpRequestInitializer - Request failed with code 409, will NOT retry: https://www.googleapis.com/storage/v1/b?predefinedAcl=projectPrivate&predefinedDefaultObjectAcl=projectPrivate&project=<redacted>"

However, we are setting the --stagingLocation parameter, and we can see all the binaries/jars etc. are uploaded to the bucket that we've specified in the --stagingLocation parameter.

However, Beam/Dataflow then creates the following zombie bucket in GCS in our project: dataflow-staging-us-central1-435085767562

Why is this happening if we are clearly setting the --stagingLocation parameter?

1
Would you be able to provide the command you used to create the template pipeline? Did you specify --stagingLocation when you created it? And what command are you using to launch the pipeline? I'm just trying to understand which commands you are referring to when setting stagingLocation - Alex Amato

1 Answers

3
votes

I suspect this is BEAM-2143. Specifically, althhough the error says you need to specify --stagingLocation, you actually need to specify --tempLocation.