2
votes

I have a cloud run service which will run upto 60 minutes.The pubsub is the trigger point for execution of cloud run service. pubsub configuration for Retry policy is set to max (600s).

Now when a message is published from pubsub, cloud run starts executing, as the complete execution takes around 60 minutes to complete, but the pubsub message after 600s starts to retry again as it doesn't received any acknowledge from cloud run and again causing cloud run service executing again and again.

How to handle the pubsub retry here so that cloud run will not execute again and again because of retrying.

2

2 Answers

1
votes

I was thinking to use Cloud Tasks, or Cloud Workflows as a proxy for your long running Cloud Run. Unfortunately both services have max timeout of 1800s (30minutes). By the way upcoming callback feature of Cloud Workflows will have 12h timeout. In the meantime I would create a proxy as Cloud Function triggered by PubSub message that will be immediately acknowledged, and the function will call your Cloud Run in async with the PubSub message and return right away.

0
votes

With push subscriptions, such as what you'd use with a Cloud Run service, the maximum ack deadline for a message is indeed 600s. If using pull, one can call ModifyAckDeadline to extend the deadline for a message. In fact, the client libraries for Cloud Pub/Sub do this automatically for up to a configured amount of time (default is 60m).

There is not going to be a way to extend the deadline if using a push subscription. Therefore, your options are:

  1. Switch to a pull subscription. You could potentially do this via Cloud Run, though it would not be the best fit. More likely, you want to spin up a job in an environment that can keep it running without any kind of trigger, e.g., GKE. If you switch to pull, you can extend the ack deadline, though note that duplicates are still possible, even if the ack deadline has not expired or the message has already been acknowledged. They should be rare, but you still have to account for it.

  2. When you receive the message, persist it somewhere, either on disk or in a database, and then acknowledge the message once persisted. Once you are actually done processing the message an hour later, you remove it from this persistent storage. Of course, you could just persist the message instead of publishing it via Pub/Sub and rely on the persistence layer's notifications mechanisms to learn of the new message. For example, if you write to GCS, you could use Cloud Storage notifications via Pub/Sub. In this case, you probably want to have some periodic read from your storage to see if there are any messages that have not been processed for some period of time and if so, reprocess them. For example, if you write with the message the time at which processing started and if more than some amount of time has passed since then and the message is still present, you could start the processing over again.