Google Pub/Sub to control RateLimit

Question

I've made one cloud function to do the following:

Get order id's filtert on day before today (around 400 id's) one time at night.
For each id get detailed information from source.
For each detailed information get extra information from target.
Send to target as invoice.

My problem is in step 2. The rate limit is 14 request per minute. So i was thinking to create an Pub/Sub between 1 and 2. Create a subscription function that pulls messages from the topic. Processes 14 of them and ack those messages, then resolve the promise. But this leave me with questions:

Is this the right flow?
How do i schedule step 2?

if i get a 429 response i wait for 1 minute (this will be billed)
let step 2 run every 1 minute to not get the 429 response code? (this will run a long time of the day when there are no messages.
instead of pulling let the pub/sub trigger the function, check the response code -> if 429 then wait for 1 minute to proces. But then my question: Will it invoke one instance of the cloud function or multiple?

I hope someone can share some thoughts. I'm new to cloud functions and async programming. My hole function works as expected with a limit of 5 orders. Just later on i came acros this rate limit (i know, stupid me).

al-dann al-dann · Accepted Answer · 2021-04-02T17:25:03

Personally I think it may be possible to use Firestore to store the state of a process for each id (document id).

I mean that the first function creates (about) 400 documents in the Firestore collection - a document per id in the question. Then each document can be used as a "state machine" and as a "log" for the given document processing - from detailed information retrieval to invoice generation and posting...

In addition, the first function can send a pubsub message with the document id (about 400 messages so far).

On the other side of the pubsub topic - there is the second cloud function. I would calculate the maximum number of its instances running in parallel to minimise API rate exceptions, but if the exception is received - it is not a problem. This function do the real job and updates the state of the firestore document (i.e. "DONE") in case of success, or some other state in case of exception...

Then we have some a scheduler (let's say once in 15 minute, or 10, or 20 - I don't know the context), a pubsub topic and another cloud function. This cloud function (upon getting a message for the scheduler) scans the Firestore collection, and for the documents with the 'exception' state sends a message into the first pubsub (see above) so that document can be reprocessed - this is a "self-healing" process...

The process "log" (state updates, etc.) can be collected in the Firestore document, so each of them gets a "history" of how (an how long) the whole process took place...

In addition, I would use a dedicated Stackdriver log to monitor how it is going (if a number of exceptions per time unit exceeds some limit - it can cause an alarm triggering, for example). Log messages, should contain more or less similar information as the "log" (and other details) in the Firestore collection.

Then I would use a sink into a BigQuery and have some reports/dashboards (if required)...

Finally there should be a custom service account to run those cloud functions with relevant IAM roles to work with GCP resources (pubsub, firestore, Stackdriver, etc.)...

Google Pub/Sub to control RateLimit

1 Answers