1
votes

We've recently had a Dataflow job fail because it could not find the json file to load into Big Query. My understanding is that the json file is generated by Dataflow and that this is an unexpected state.

The job has been running daily for almost a year and this is the first time we've seen this error. A subsequent run was also successful.

Oct 26, 2015, 3:13:32 PM S15: (1c654a773802760a): Workflow failed. Causes: (1c654a773802735f): BigQuery import job "dataflow_job_11909924374132686736" failed. Causes: (1c654a77380270b4): BigQuery job "dataflow_job_11909924374132686736" in project "project_name" finished with error(s): job error: Not found: Google Storage File gs://cdf/binaries/denormailization/11909924374132684847/-00081-of-00120.json, error: Not found: Google Storage File gs://cdf/binaries/denormailization/11909924374132684847/-00081-of-00120.json

Job id: 2015-10-25_21_01_46-11909924374132686437

1
Hey! The way you explain this, this sounds more like an issue than something you can solve on Stack. I would suggest posting to the Dataflow user voice forum here : googlecloudplatform.uservoice.com/forums/302628-dataflow/…Patrice
Dataflow writes bounded PCollections to BigQuery by writing to temporary files and then running a BigQuery import job to load the data into the BigQuery table. It's pretty unexpected to see a file get missed. We're investigating internally to see why this might have happened. Please let us know if you see it again.Frances
Thanks @Frances we haven't seen it since but will let you know if we do.matthewd

1 Answers

0
votes

Probably a missing file.

Not found: Google Storage File gs://cdf/binaries/denormailization/11909924374132684847/-00081-of-00120.json