1
votes

We have a Dataflow job that has a low system latency and a high "data freshness" (or "data watermark lag").

After upgrading to Beam 2.15 (from 2.12) we see that this metric keeps increasing, which would be caused by something stuck in the pipeline. However, this is not the case, as all data was consumed (from a PubSub subscription). Permissions also seem ok as we can consume (unless that is not enough?).

We also checked individual watermarks on all components of the pipeline, and they are ok (very recent).

Increasing data watermark

Thanks!

1
If you're using the Python SDK, the issue might be that the support for properly calculating metrics in Dataflow isn't yet supported. I'm seeing the same thing on my Dataflow jobs. - andreimarinescu
@andreimarinescu thanks for your answer! while we have some python jobs running, this particular one is actually a java job :/ but nevertheless it's good to know that for python we won't get these metrics. - Jonny5
@Jonny5 I am facing the same problem, did you find a solution - davideanastasia
My feeling is that there is some kind of incompatibility in 2.15.0 with the previous version, causing this. rolling out 2.15.0 (no update) or 2.13.0 gives me no problem, but updating it from 2.13.0 to 2.15.0 did - davideanastasia
Still the same issue on our side. At some point everything was ok again (sudden drop in freshness) but on our new job it is increasing again :( - Jonny5

1 Answers

0
votes

This is indeed quite odd. Here are some reasons why you might be seeing this:

  1. There may be a bug in a new Beam SDK, or in Dataflow when estimating the watermark.
  2. It may be that you updated the topology of your pipeline, and hit a bug related to watermark calculation for old/new topology.
  3. The job may indeed be stuck, and you may have missed some data that actually did not make it across the pipeline.

My advice, if you're seeing this, is to open a support case with Dataflow support.