0
votes

I've been using Azure SDK for .NET (with core 3.1) client to query collections by calling GetItemQueryIterator() on a container.

I’ve observed that the FeedResponses returned by the FeedIterator returned by GetItemQueryIterator correspond to physical partitions, but I haven’t found any confirmation of this in the docs.

Can someone confirm that:

  • If a partition key value isn’t specified in the query, the FeedIterator will return a FeedResponse for each physical partition in the collection?
  • And that if a partition key value is specified, the FeedIterator will return only one FeedResponse with results from the physical partition holding the specified logical partition?

If the above statements aren't true, are there any guarantees or not about the relationship between partitions (logical and/or physical) and FeedIterators and FeedResponses?

Thanks!

1

1 Answers

2
votes

TL;DR - It is a safe bet that if you don't specify a partition key, your results will be a aggregate of the results from a given range of partitions.

It's important to remember that, for the most part, the logical to physical partition mapping is treated as an implementation detail, though in practice that's a grey zone (a grey zone most folks can safely ignore). Additionally, how query mechanically works changes a good bit overtime (though functionally it should not change) as improvements are made, but the code is open source from the SDK side of things, so I can at least describe the point in time as you could see from any of the SDK implementations.

If you provide a partition key, the answer is easy - your results will come from a single physical partition at a time. It might be different each time (though in practice, it will be the same range), because partitions can split between queries. Your query might also be served by a different replica each time. The above is true for all flavors of single partition targeting queries.

Cross partition queries get more interesting. By and large, cross partition queries go through a pipeline which merges results depending on the query plan. So order by queries will go from physical partition to physical partition grabbing pages until they have enough to be assured ordering is maintained. Aggregates do similar things/etc. This of course comes with the overhead of having to talk to multiple partitions before you can serve results and more overhead on the resources the client consumes to serve the request, so we don't recommend heavy cross partition queries being in hot path code (save it as a materialized view, etc). There are a few cases where it skips this pipeline and each response corresponds to a page served over the network, but usually just plain where clauses.