1
votes

Assume we have some data coming in through a Google PubSub topic and its traffic pattern is spiky in nature, with potentially long quiet time before a burst of data coming in at fast rate for minutes.

For processing that data, if we are going to use streaming mode Dataflow with subscription based PubSubIO as data source, will the dataflow always be in the running state with the minimum number of workers, or will it be restarted when the burst of data come in, but then stopped once we get into the quiet period?

1

1 Answers

2
votes

If you enable autoscaling, Dataflow will raise or lower the number of workers dynamically according to load, without restarting the pipeline. You can read more about it here and here.