1
votes

I am creating a database collection that will have a subcollection that will contain old versions of the root level content. The collection structure will look pretty similar to the structure from this question:

Firestore-root
|
--- content (collection)
    |
    --- contentId (google generated) (document)
        |   // latest fields here
        ----|
            --- history (subcollection)
                |
                --- oldContentId
                    // old field/values here
                --- oldContentId2
                    // old field/values here
       

So If I wanted to get the old version of the content I could call:


const oldContent = await fs.collection("content").doc(contentId).collection("history").doc(oldContentId).get();

I'd like to use monotonic-like ids for the document ids in the history subcollection. I'm aware of the advice to avoid the use of such ids to avoid hotspotting. What is not clear to me is if this advice remains the same for ids for documents in subcollections. My guess is it does, but just want to be clear about it.

So for example say I use google generated ids for the subcollection and get:


# ggdId == google generated Id
content/ggdId-1/history/ggdId-1
content/ggdId-1/history/ggdId-2
...
content/ggdId-1/history/ggdId-N

content/ggdId-2/history/ggdId-1
content/ggdId-2/history/ggdId-2
...
content/ggdId-2/history/ggdId-N

Will google cloud split this data better than if I use monotonic-like ids in the subcollection:

content/ggdId-1/history/1
content/ggdId-1/history/2
...
content/ggdId-1/history/N

content/ggdId-2/history/1
content/ggdId-2/history/2
...
content/ggdId-2/history/N

Finally is the advice a hard rule, or is there nuance depending on how the collection/subcollection is used? So say I don't anticipate that many high read/writes to the history subcollection, would that mean that one could use monotonic-like ids.

2

2 Answers

1
votes

What is not clear to me is if this advice remains the same for ids for documents in subcollections.

The advice to avoid monotonic ID applies to all collections, regardless of how they are nested. It just doesn't scale the way Firestore requires. There is really no workaround for this.

If you're sure that the throughput isn't going to be so high that it will cause problems, then do what you want. But it's best to use randomly generated IDs, and impose ordering based only on the fields of the documents.

In a general sense, with cloud services that must scale massively, ordering is hard.

0
votes

Hotspots in Cloud Firestore writes are almost always the result of Firestore having to update its indexes, which it needs in order to meet its read/query performance guarantees.

If you use non-random IDs for documents it increases the chances that Firestore hits a hotspot when updating its indexes. This depends on the indexes it has to update, and not in any way to whether the collections are global or subcollections.

While using subcollections may reduce the number of writes to an index to just the writes to that subcollection, this may be countered if you use collection group queries (since those have a single index for all collections with the same name).