0
votes

I am using Cosmos DB in Autopilot mode with a max RU/s set at the Collection for 20K RU/s. I'm using the SQL API. I have several hundred thousand records in this particular partition. I have the default indexing policy, which is that everything is indexed. I have gone into the Data Explorer within the Azure Portal and have run the following query:

SELECT count(1) FROM c where c.partitionKey = "12140" and c.uniqueId = "20cdb686-6959-4ed6-ac31-c7f414238f68" and c.timestampAsString = null

The Query Stats show an RU/s charge of 3.46. This is great!

I then run a very similar query, but I am testing c.timestampAsString for being not null:

SELECT count(1) FROM c where c.partitionKey = "12140" and c.uniqueId = "20cdb686-6959-4ed6-ac31-c7f414238f68" and c.timestampAsString != null

Total RU/s charges for the above query are 2989.73 RU/s. This is quite a bit higher. I was expecting something along the lines of the original query.

Can anyone explain why these query charges are so very different?

1
Hmm. Do you get the same type of result discrepancy when using IS_NULL(c.timestampAsString) and NOT IS_NULL(c.timestampAsString)? - David Makogon
@David Makogon, I was hoping you'd see this. And yes, I do get the same RU/s charge for NOT IS_NULL as I do for != null. - Rob Reagan
One more question: is this with the latest indexing rollout, or is this with an older collection? (there was a rollout + announcement that just happened) One of the optimizations explicitly called out (article link) is the inequality operator. - David Makogon
It is a collection that's two months old, so it pre-dates this improvement. The article you referenced recommends adding an index on the property in question. I am using the default indexing policy of indexing everything. Can I add an explicit additional index? - Rob Reagan
I'm not 100% positive that these updates apply to current collections (at least immediately); something I have to look into. And I'm pretty sure you don't need to explicitly add an index if you've set yourself up for automatic indexing. The key part of the article applying to inequality: "If there were inequality filters or filters on undefined values remaining, the query engine would be required to load each of these documents. Since inequality filters and filters on undefined values now utilize the index, we can avoid loading these documents and see a significant improvement in RU charge." - David Makogon

1 Answers

1
votes

The reason for this huge discrepancy in RU is how/whether we push filters down to the index. For queries like c.timestampAsString = null we do push this down to the index. For queries where you do aggregates and filters like c.timestampAsString != null we do not.

You can do some things however to optimize this query. If you create a second property called, "istimestampAsStringNull" and set to true to false this would give you the ability to do the query below and you should get roughly the same RU/s cost.

SELECT count(1) FROM c where c.partitionKey = "12140" and c.uniqueId = "20cdb686-6959-4ed6-ac31-c7f414238f68" and c. isTimestampAsStringNull == false

We are looking at this for a future optimization but for now this workaround above should work.

Hope this is helpful.