0
votes

Been testing dataflow for a little while, today I run into a few failures, error messages were:

Causes: (b8a42e32d0888f60): Unable to rename ouput files from gs://clickstream_current/tmp/dataflow/staging/clickstream/8977742977819433140/dax-tmp-2015-04-14_15_58_06-5441905562239213266-S01-1-e70068cb69ef966a/@DAX.json to gs://clickstream_current/tmp/dataflow/staging/clickstream/8977742977819433140/@*.json. Causes: (b8a42e32d0888fdb): Unable to rename "gs://clickstream_current/tmp/dataflow/staging/clickstream/8977742977819433140/dax-tmp-2015-04-14_15_58_06-5441905562239213266-S01-1-e70068cb69ef966a/-shard-00000-of-00940-endshard.json" to "gs://clickstream_current/tmp/dataflow/staging/clickstream/8977742977819433140/-00000-of-00940.json.

Could this be GCS issue? besides the failure there were warning complaining about "Unable delete temporary files from GCS folders" anything I can do to avoid it?

1
Is this failure happening consistently? Have you tried your job on a smaller dataset and/or increasing the number of workers?Jeremy Lewi
Does the object gs://clickstream_current/tmp/dataflow/staging/clickstream/8977742977819433140/dax-tmp-2015-04-14_15_58_06-5441905562239213266-S01-1-e70068cb69ef966a/-shard-00000-of-00940-endshard.json actually exist?Jeremy Lewi
What types of Write transforms is your pipeline using? e.g. BigQueryIO, TextIO, AvroIO?Jeremy Lewi
it didn't happen consistently, the job succeeded when the job run on a smaller dataset (20 times smaller, this worries me)Echo
when I checked on the file and it didn't existEcho

1 Answers

1
votes

We've identified an issue in the service that will cause this failure in certain rare circumstances. We are working on fixing this issue. In the meantime we apologize for the inconvenience. The error is slightly more likely if you are using a BigQueryIO.Write to output your data.