1
votes

TL;DR It's taking a lot of time to fetch a small number of documents

Scenerio:
I have a collection for each account and each account contains a projects sub collection and tasks sub collection. Each document in the tasks sub collection can further contain checklists in the checkLists sub collection

Note:

  • Projects can contain tasks which can in turn contain checklists.
  • Tasks can be created standalone i.e; it needn't always be part of a project.
  • projects and tasks are both top level sub collections and checkLists sub collection is nested inside every task.

Illustration:

someTopLevelDB
   |
   |____ accountId1
   |         |______projects
   |         |         |_______ projectId1
   |         |
   |         |______tasks
   |                  |________taskId1 (belongs to projectId1)
   |                  |           |
   |                  |           |________checkLists
   |                  |                         |
   |                  |                         |_____checkListId1
   |                  |
   |                  |________taskId2 (standalone)

Use case: When an user clicks duplicate project (from the UI), I have to create a replica of the entire project i.e; all tasks, checklists etc..

Code: The process to do so was slow and when I profiled the code, this snippet took a lot of time to execute. The snippet fetches all the tasks and its checklists

let db = admin.firestore();

function getTasks(accountId) {
    return db.collection('someTopLevelDB')
        .doc(accountId)
        .collection('tasks')
        .where('deleted', '==', false)
        .get();
}


function getCheckLists(accountId, taskId) {
    return db.collection('someTopLevelDB')
        .doc(accountId)
        .collection('tasks')
        .doc(taskId)
        .collection('checkLists')
        .where('deleted', '==', false)
        .get();
}


async function getTasksAndCheckLists(accountId) {
    try {
        let records = { tasks: [], checkLists: [] };

        // prepare tasks details
        const tasks = await getTasks(accountId);
        const tasksQueryDocumentSnapshot = tasks.docs;
        for (let taskDocumentSnapshot of tasksQueryDocumentSnapshot) {
            const taskId = taskDocumentSnapshot.id;
            const taskData = taskDocumentSnapshot.data();
            const taskDetails = {
                id: taskId,
                ...taskData
            };
            records.tasks.push(taskDetails);

            // prepare check list details
            checkListQueryDocumentSnapshot = (await getCheckLists(accountId, taskId)).docs;
            for (let checkListDocumentSnapshot of checkListQueryDocumentSnapshot) {
                const checkListId = checkListDocumentSnapshot.id;
                const checkListData = checkListDocumentSnapshot.data();
                const checkListDetails = {
                    id: checkListId,
                    ...checkListData
                };
                records.checkLists.push(checkListDetails);
            }
        }
        console.log(`successfully fetched ${records.tasks.length} tasks and ${records.checkLists.length} checklists`);
        return records;
    } catch (error) {
        console.log('Error fetching docs ====>', error);
    }
}




// Call the function to fetch records
getTasksAndCheckLists('someAccountId')
    .then(result => {
        console.log(result);
        return true;
    })
    .catch(error => {
        console.error('Error fetching docs ===>', error);
        return false;
    });

Execution stats:
successfully fetched 627 tasks and 51 checklists in 220.532 seconds

I came to a conclusion that retrieving checklists was slowing the entire process down as retrieval of tasks was fairly quick.

So my question is as follows:

  • Is there any way to optimize retrieving the documents in the above code?
  • Is there any way to retrieve the documents of the sub collection quicker by remodelling the data and using collectionGroup queries etc?

Thanks.

1
wouldn't it be better if you reduce the nested collections, read thisVipul Dessai
The documentation you have shared is for realtime database and not for cloud firestore. Both are different as firestore queries are shallow unlike RTDB. Storing checklists as a sub collection allow me to lazy load it on demand and provides me flexibility for pagination and sorting instead of storing it as one big array inside the task. The reason I haven't stored checkLists as a top level collection like projects and tasks is because projects and tasks are independent entities and can co exist without the other but it's not the case with checklists.Adarsh
@vipul I think maybe my illustration of data looked like RTDB. My apologies for that.Adarsh
My apologies actually, I mistook firestore for RTDB, yes the firestore grants the flexibility to store nested data.Vipul Dessai
just wanted to add this, there is db.collectionGroup function which might help you, if u get time u can check this outVipul Dessai

1 Answers

4
votes

The problem is caused by using await inside of your for loop here:

checkListQueryDocumentSnapshot = (await getCheckLists(accountId, taskId)).docs;

This causes your for loop to stall for as long as it takes to get the check lists of that particular task.

The way to avoid this is to process the check lists asynchronously using Promise chaining. As you loop over the tasks, you create the request for that task's check lists, add a listener to it's result and then send it and immediately move to the next task.

With your data structure, the check lists are related to their specific task on the server, but they aren't tied to them in your code above. When working asynchronously with the same data structure would mean that they will be out of order with your tasks if you are just using a standard array with push() (e.g. task B's checklist fetch may finish before task A's). To fix this, in the below code, I have nested the checklist under the taskDetails object so they are still linked.

async function getTasksAndCheckLists(accountId) {
    try {
        let taskDetailsArray = [];

        // fetch task details
        const tasks = await getTasks(accountId);

        // init Promise holder
        const getCheckListsPromises = [];

        tasks.forEach((taskDocumentSnapshot) => {
            const taskId = taskDocumentSnapshot.id;
            const taskData = taskDocumentSnapshot.data();
            const taskDetails = {
                id: taskId,
                checkLists: [], // for storing this task's checklists
                ...taskData
            };
            taskDetailsArray.push(taskDetails);

            // asynchronously get check lists for this task
            let getCheckListPromise = getCheckLists(accountId, taskId)
                .then((checkListQuerySnapshot) => {
                    checkListQuerySnapshot.forEach((checkListDocumentSnapshot) => {
                        const checkListId = checkListDocumentSnapshot.id;
                        const checkListData = checkListDocumentSnapshot.data();
                        const checkListDetails = {
                            id: checkListId,
                            ...checkListData
                        };

                        taskDetails.checkLists.push(checkListDetails);
                    });
                });

            // add this task to the promise holder
            getCheckListsPromises.push(getCheckListPromise);
        });

        // wait for all check list fetches - this is an all-or-nothing operation
        await Promise.all(getCheckListsPromises);

        // calculate the checklist count for all tasks
        let checkListsCount = taskDetailsArray.reduce((acc, v) => acc+v.checkLists.length, 0);

        console.log(`successfully fetched ${taskDetailsArray.length} tasks and ${checkListsCount} checklists`);
        return taskDetailsArray;
    } catch (error) {
        console.log('Error fetching docs ====>', error);
    }
}

With these changes, you should see the duration your function runs greatly reduce. Based on the timings you have provided, I'd guess it would drop to roughly 2-3 seconds.