3
votes

TL;DR Is there an API to get firestore collection size instead of us having to do so manually?

So according to their docs, one can calculate the document size by identifying their datatypes.

A sample code to illustrate the above is as follows:

import datetime

def calculate_doc_size(data):
    if isinstance(data, dict):
        size = 0
        for key, value in data.items():
            key_size = len(key) + 1
            value_size = calculate_doc_size(value)
            key_pair_size = key_size + value_size
            size += key_pair_size
        return size
    elif isinstance(data, str):
        return len(data) + 1
    elif any([isinstance(data, bool), data is None]):
        return 1
    elif isinstance(data, (datetime.datetime, float, int)):
        return 8
    elif isinstance(data, list):
        return sum([calculate_doc_size(item) for item in data])

def calculate_doc_name_size(path):
    size = 0
    for collection in path.split('/'):
        size += len(collection) + 1
    return size + 16

document = {'a': {'a': 1, 'b': 2, 'c': None}, 'b': [1, 2, 3], 'c': [{'a': 1}]}
size =  calculate_doc_name_size('database/account1/my_doc_id') + calculate_doc_size(document) + 32
print(size) # prints 139

So my idea was to calculate the doc size in on write event using cloud functions and maintain a counter which specifies the collection size at any given point in time. (size being in bytes)

By size I mean not the number of keys in the document but the storage taken.

Is there a better way to get the firestore collection / document size?

Use case:
Say I wanted to limit an account by storage space like how gmail does (15GB per user). I wanted to achieve that with each account being a collection identified by account_id. Example:

- database
    - account 1
        - collection 1
        - collection 2
    - account 2
        - collection 1
        - collection 2

Related post: calculating size of google firestore documents

1

1 Answers

2
votes

Is there a better way to get the firestore collection / document size?

Nope, that's pretty much what I'd do. Well... in general I'd try to steer away from such operations as they are highly volatile. But if you really need to know how much space the documentation in a collection take up, aggregating the sum of each document's size is the way to go for calculating it, and Cloud Functions seem like a natural technology to perform that calculation on every update.