1
votes

We are using apache-beam python 2.3 with Google Cloud Dataflow. Since about 2 weeks the Cloud Dataflow Dashboard at https://console.cloud.google.com/dataflow is heavily delayed for us (about 30mins - 1h).

This comes in 2 flavours:

  • newly started jobs do not show up in the Overview, also the status link provided by beam for the detailed job status page does not work with an error "Job not found"

  • also, if jobs are finally shown they often show a status of "running", while in reality they are already finished

This is also true when trying to access the status via gcloud cli tool (such as "gcloud dataflow jobs list").

Eventually (after up to 2h) all jobs are updated and displayed correctly.

Now, my question is: What is the reason for this and how can I get an up-to-date dashboard? Is there possibly anything I am doing wrong when running the job, do I need to pass another parameter or something?

We run all jobs in region europe-west1, with all workers in zone=europe-west3-a (Frankfurt/Germany) due to data privacy regulations on the data we are working with.

2
I have submitted an issue here. Feel free to add extra information via comments.Robbe
I have also submitted a public issue hereXiaoxia Lin
I recommend that you follow up the updates on the issue here in GCP Public Issue Tracker by starring the case.Katayoon

2 Answers

0
votes

We are seeing this as well (also europe-west-1c).

While Google figures this out, one workaround that we use to get around this is to open some old job that's already in the list and to replace Job ID in the URL directly. This way the new job and all its related information will display in the web page. Not a perfect solution, but it works for now.

So when you start your code, it should say something like 'Job 2018-03-06_09_31_00-13061856958687011068 submitted' that's the ID that you need to replace...

By the way, it doesn't seem related to the 2.2.3 upgrade, as we started seeing this issue a couple of weeks ago even while still running 2.2.0

0
votes

There were some listjobs server OOM crashes which caused a delay to dashboard updates, but now the issue has been resolved.