0
votes

I have simple Dataprep jobs that are transferring GCS data to BQ. Until today, scheduled jobs were running fine but today two jobs failed and two jobs succeeded after taking more than half hour to one hour. Error message I am getting is below:

java.lang.RuntimeException: Failed to create job with prefix beam_load_clouddataprepcmreportalllobmedia4505510bydataprepadmi_aef678fce2f441eaa9732418fc1a6485_2b57eddf335d0c0b09e3000a805a73d6_00001_00000, reached max retries: 3, last failed job:

I ran same job again, it again took very long time and failed but this time with different message:

Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. Please check the worker logs in Stackdriver Logging. You can also get help with Cloud Dataflow at https://cloud.google.com/dataflow/support.

Any pointers or direction for possible cause! Also, link or troubleshooting tips for dataprep or dataflow job is appreciated.

Thank you

1

1 Answers

1
votes

There could be a lot of potential causes for the jobs to get stuck: Transient issues, some quota/limit being reached, change in data format/size or another issue with the resources being used. I suggest to start the troubleshooting from Dataflow side.

Here are some useful resources that can guide you through the most common job errors, and how to troubleshoot them:

In addition, you could check in the Google Issue tracker of Dataprep and Dataflow to see if the issue has been reported before

And you can also look at the GCP status dashboard to discard a widespread issue with some service

Finally, if you have GCP support you can reach out directly to support. If you don't have support, you can use the Issue tracker to create a new issue for Dataprep and report the behavior you're seeing.