High RU charge with case-insensitive search

Question

I have a Cosmos DB collection with 4 million documents (~5GB). The following query reports a charge of 2.79 RUs:

SELECT * FROM c WHERE c.type='type1' and STRINGEQUALS(c.name,'abc',false)

But the same query with case-insensitive search (by replacing false by true) costs 1228 RUs.

Is there an explanation for why the case-insensitive query is more than 470 times more expensive than the case-sensitive query? I'm surprised by this because the documentation states

The RU charge for StartsWith and StringEquals is slightly higher with the case-insensitive option than without it.

Details:

Both queries return 0 results.
The partition key is type.
The logical partition type1 contains 2 million documents.
The name property has a different value for almost all 2 million documents.
The default indexing strategy is used ("path": "/*")

Could be that the emulator does not have all of the optimizations that Cosmos added over time. When I query this over a collection of 100,000 I get almost equal RU usage (2,79 vs 2,80). I would expect a significant difference even with only 100,000 documents. So I think the RU from the emulator is not a good representation in this case. — 404
@404 I also had almost identical RU charges on the Emulator with smaller collections (a few thousand docs). I had hoped I could measure the charge for various queries in with various different data collections on the Emulator to save time and money. :-( — Mo B.
@404 I can now confirm that the same behavior is exhibited in a "real" Cosmos DB collection. 2.79 RUs for case-sensitive and 1228.01 RUs for case-insensitive search. — Mo B.
@404 I suspect that you observe no difference in RU consumption because either the first criterion c.type = 'type1' restricts the number of possible results to a small number or the cardinality of c.name is much smaller in your collection. — Mo B.
I tested without the c.type expression so the query ran over my entire collection filtering on a MD5 hash. Upon trying the same with a new database with 12 million documents the performance gets even worse than your results with 569604 RU. I'm quite surprised about that outcome considering my 100k collection barely showed a difference. Adding an order by c.md5 to force an index scan lowers it to 7700 RU — 404

Mo B. Mo B. · Accepted Answer · 2021-02-17T20:29:17

This is what I found out so far:

Does Cosmos DB support efficient case-insensitive string comparison?

Unfortunately, the RU charge for case-insensitive STRINGEQUALS seems to be linear in the cardinality of the property (i.e. the number of different values for that property). Which is really, really bad if you have lots of documents. The query above takes almost 1 s at a throughput of 10,000 RU/s. In contrast, case-sensitive string comparison is independent of the size of the collection. See also this discussion.

What if I need efficient case-insensitive string comparisons?

For small collections (< 10,000 docs) case-sensitivity doesn't make that much of a difference. (And also of course if the inclusion of the partition key restricted the size of the potential result set to a much smaller number.)

For larger collections you could store a duplicate of each property that should support efficient case-insensitive search in lower case and do a case-sensitive search on the lower-case property instead.

You can vote for the Feature Request to support efficient case-insensitive queries here.

High RU charge with case-insensitive search

1 Answers

Does Cosmos DB support efficient case-insensitive string comparison?

What if I need efficient case-insensitive string comparisons?