Cloud Firestore schema for text annotation application

Question

I have an application for reading and annotating texts and wonder how best to structure the underlying Firestore database.

The application itself is a relatively simple ReactJS SPA running inside the browser. All users can independently upload text documents into the system and then annotate these documents in the user interface. To annotate, the user opens the document, clicks on a word and then enters some metadata about that word in a pop-up window. The system then highlights every occurrence of that word in all documents of that user, with a different colour depending on the metadata provided.

My original plan was to create 2 independent collections:

A collection /documents, which will contain one Firestore document for each text document uploaded by the user. We expect to have an average of 200 text documents per user, each with up to 200kb of data and referencing up to 1.000 annotated words.
A collection /words, which will contain one Firestore document for each word annotated by the user. We expect to have an average of 30.000 annotations per user, each with around 500 bytes of data.

I am now somewhat concerned that such a database scheme would entail relatively high operational costs, since the relevant annotations would have to be loaded from the database each time a text document is being displayed (and the Firestore Blaze plan would bill me 0.06$ for each 100.000 reads).

Is there perhaps a better (more cost-effective) way to structure this database?

Andrew Andrew · Accepted Answer · 2021-01-08T09:25:22

You should only fetch data that has changed and use the cache by retrieving only the document's latest version. If nothing has changed, this will result in only a single read operation. Typically, the get() call only retrieves the most recent snapshot of a document and ignores the offline cache. However, you can take advantage of the offline cache if the network is unavailable or the request times out.

var docRef = db.collection("cities").doc("SF");

// Valid options for source are 'server', 'cache', or
// 'default'. See
// https://firebase.google.com/docs/reference/js/firebase.firestore.GetOptions
// for more information.
var getOptions = {
    source: 'cache'
};

// Get a document, forcing the SDK to fetch from the offline cache.
docRef.get(getOptions).then(function(doc) {
    // Document was found in the cache. If no cached document exists,
    // an error will be returned to the 'catch' block below.
    console.log("Cached document data:", doc.data());
}).catch(function(error) {
    console.log("Error getting cached document:", error);
});

Cloud Firestore schema for text annotation application

1 Answers