0
votes

So according to the documentation "Dataflow scales up if a streaming pipeline remains backlogged with workers utilizing, on average, more than 20% of their CPUs, for a couple minutes" (https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#autoscaling). Is there an exact timeframe when Dataflow starts scaling up?

Because I tested my streaming job using Streaming Engine (with 1 worker by default) to see if the autoscaling works and if the number of workers go up but after having a CPU utilization of more than 20% for more than 6 min. and also having a backlog of unacknowledged messages from PubSub for the same amount of time (around 6 min.) the number of current workers kept being 1 and no autoscaling happened.

Also regarding the autoscaling chart under Job metrics in Dataflow it says for me: "Current workers: 1, Target workers: 1". What does "target workers" mean and what is the difference with current workers?

Thanks in advance for any help.

1

1 Answers

0
votes

Regarding the autoscaling chart.

  • Current workers - Number of workers currently used by the pipeline
  • Target workers - Number of workers suggested by the Dataflow autoscaling algorithm. Could be an upscale or a downscale. Pipeline is trying to achieve this.

Regarding time for scaling decision, I don't think there's an exact guarantee provided by the Dataflow service here. Please refer to the Streaming Autoscaling related documentation here. If you think if there's an issue related to autoscaling, please contact Dataflow support so that that they can look into your specific pipeline.