I'm experimenting with using Cloud Functions as async background worker triggered by PubSub and doing a bit longer work (in order of minutes). The complete code is here https://github.com/zdenulo/cloud-functions-pubsub
My prototype inserts data into BigQuery and waits for a few minutes (to mimic longer task). I am publishing 100 messages to PubSub topic (with 1 second interval).
It's emphasized that PubSub can deliver more than once the same message, but I was surprised that from 10 to 40 out of 100 are duplicated. Response time for CF was 5, 6, 7 minutes. For 4 minutes response, I didn't notice duplicates.
I've done multiple tests for the same time intervals. Time difference between receiving first and second message ranges from ~30 to ~600 seconds.
In documentation https://cloud.google.com/pubsub/docs/troubleshooting is mentioned "Cloud Pub/Sub can send duplicate messages. For instance, when you do not acknowledge a message before its acknowledgement deadline has expired, Cloud Pub/Sub resends the message." For Cloud Functions Subscription, acknowledge deadline is 600 seconds (10 minutes), so based on my understanding that shouldn't be the reason.
Maybe the test case I have is specific or maybe there is something else.
I would be grateful for advice on how to handle such a situation and if this is normal or how to do it to prevent duplicates (excluding Dataflow).