1
votes

Just got this error message:

(941d0d42ab1c3aec): Workflow failed. Causes: (941d0d42ab1c3675): The Dataflow appears to be stuck. Please reach out to the Dataflow team at http://stackoverflow.com/questions/tagged/google-cloud-dataflow.

please help.

1
job id: 2017-09-11_07_13_42-2005096921586938573bignano

1 Answers

0
votes

Thanks for sharing the job id. From the Stackdriver logs, I see that worker VMs failed to startup becuase they were not able to fetch the container image from docker:

Handler for GET /v1.23/images/dataflow.gcr.io/v1beta3/beam-java-batch:beam-0.6.0/json returned error: No such image: dataflow.gcr.io/v1beta3/beam-java-batch:beam-0.6.0

EDIT: After further inspection, I can see there are no staged jars for the job. It seems the stagingFiles is being overrriden with just a csv file: header_H-[..].csv.

If you are specifying the getFilesToStage() option, you must also include the full list of jar files necessary to run your pipeline. You can see how the DataflowRunner does this in detectClassPathResourcesToStage(classLoader).


As an aside, this pipeline is using the Dataflow SDK 0.6, which is currently deprecated. The latest 1.x release is 1.9.1, or you can upgrade to 2.1.0 which is based on Apache Beam.