2
votes

Our Cloud dataflow job has failed this morning with the following error:

Jul 20, 2015, 7:02:57 AM (41114dff4e115290): Workflow failed. Causes: (ba1dfdda2b6af333): Map task completion for Step "Clicks_07_2015-BQ-Read+Clicks_07_2015-ParDoDFP-transform+Clicks_07_2015-BQ-Write" failed. Causes: (3bcd8d4fd3828211): No exported files "gs://path/to/file/*.json" found after export of table "Clicks_07_2015" in dataset "--dataset--" in project "{--project--id--}".

This Job has been running successfully for the past few days without any code changes and has failed this morning. We can see that there is a json file in this cloud storage folder so i'm not sure why this could have failed. Is this a bug?

Job Id: 2015-07-19_14_01_42-8050965853069761045

1

1 Answers

2
votes

When tables are exported from BigQuery to Cloud Storage, they are subject to the eventual consistency properties of that system. In this case, it appears that the index was still stale after repeated retries, at which point Dataflow failed the job. We'll work on handling this particular case better, thank you for your patience.

This should a rare occurrence, but you may find it useful to run the CLI in a retry loop to work around this scenario.

https://cloud.google.com/dataflow/pipelines/dataflow-command-line-intf