5
votes

This is a question about the latest Firebase Cloud Firestore.In this doc it says like this:

It also allows for expressive queries. Queries scale with the size of your result set, not the size of your data set, so you'll get the same performance fetching 1 result from a set of 100, or 100,000,000.

This statement is not clear for me. Can you explain little bit more about this use case?

2

2 Answers

22
votes

firebaser here

In most databases (including Firebase's own realtime database), query performance depends on a combination of the number of items you request and the size of the collection you request the items from.

So:

  1. If you request 10 items out of 1 million items, that will be faster than if you request 1000 items out of 1 million items.
  2. If you request 10 items out of 1 million items, that will be faster than if you request 10 items out of 100 million items.

The performance difference for #1 is expected, the data transfer alone is something that's hard to forget. Since #2 depends on the server-side processing, developers sometimes forget about #2. Many relational DBMS optimize very nicely, meaning the performance difference is often a logarithmic performance difference. But with a sufficiently large collection size, even log(n) performance is going to noticeable.

Cloud Firestore scales horizontally, which means that rule #2 from above doesn't apply:

  • If you request 10 items out of 1 million items, it will take the same time as requesting 10 items out of 100 million items.

This is because of the way Firestore's query system is designed. While you may not be able to model every query directly from a relational data model to the Firestore data model, if you can define your use-cases in terms of a Firestore query it is guaranteed to execute in a time relative only to the number of results you request. (paraphrasing Gil's comment here)

11
votes

This may be written a bit confusing. It is not a use-case in the classic sense its just a statement about the performance of Firestore.

It basically says that it does not matter if you request 1 item out of a 100 or 1 item out of 100.000.000, it will be equally fast. Here 1 is your result set and 100/100.000.000 is your data set. So requesting 1 item out of 100.000.000 will be faster than requesting 50 items out of 100.

I hope this makes it a bit clearer!