2
votes

I need to delete very large collections in Firestore.

Initially I used client side batch deletes, but when the documentation changed and started to discouraged that with the comments

Deleting collections from an iOS client is not recommended.

Deleting collections from a Web client is not recommended.

Deleting collections from an Android client is not recommended.

https://firebase.google.com/docs/firestore/manage-data/delete-data?authuser=0

I switched to a cloud function as recommended in the docs. The cloud function gets triggered when a document is deleted and then deletes all documents in a subcollection as proposed in the above link in the section on "NODE.JS".

The problem that I am running into now is that the cloud function seems to be able to manage around 300 deletes per seconds. With the maximum runtime of a cloud function of 9 minutes I can manage up to 162000 deletes this way. But the collection I want to delete currently holds 237560 documents, which makes the cloud function timeout about half way.

I cannot trigger the cloud function again with an onDelete trigger on the parent document, as this one has already been deleted (which triggered the initial call of the function).

So my question is: What is the recommended way to delete large collections in Firestore? According to the docs it's not client side but server side, but the recommended solution does not scale for large collections.

Thanks!

1
You have any backend? I dont see any limit when using let say nodejs admin-firebase.dAxx_
The backend are cloud functions. The user needs to be able to delete very large collections through an app.Georg

1 Answers

1
votes

When you have too muck work that can be performed in a single Cloud Function execution, you will need to either find a way to shard that work across multiple invocations, or continue the work in a subsequent invocations after the first. This is not trivial, and you have to put some thought and work into constructing the best solution for your particular situation.

For a sharding solution, you will have to figure out how to split up the document deletes ahead of time, and have your master function kick off subordinate functions (probably via pubsub), passing it the arguments to use to figure out which shard to delete. For example, you might kick off a function whose sole purpose is to delete documents that begin with 'a'. And another with 'b', etc by querying for them, then deleting them.

For a continuation solution, you might just start deleting documents from the beginning, go for as long as you can before timing out, remember where you left off, then kick off a subordinate function to pick up where the prior stopped.

You should be able to use one of these strategies to limit the amount of work done per functions, but the implementation details are entirely up to you to work out.

If, for some reason, neither of these strategies are viable, you will have to manage your own server (perhaps via App Engine), and message (via pubsub) it to perform a single unit of long-running work in response to a Cloud Function.