0
votes

Using the standard GCP provided Storage/text file to PubSub DataFlow template but although I have set #workernodes eq 1 the thruput of messages processed is "to high" for downstream components.

CloudFunction that runs on message event in Pub/Sub hits GCP quotas and with CloudRun I get a bunch of 500, 429 and 503 errors in the beginning (due to to step burst rate).

Is there any way to control the processing rate of DataFlow? Need to get a softer/slower start so downstream components have time to scale up.

Anyone?

1

1 Answers

1
votes

You can use Stateful ParDo's to achieve this where in you can buffer events in batches and make an API call with all the keys at once. This is very nicely explained with code snippets here