1
votes

In documentDB partitioned collections, you need to know a partitionKey value to be able to execute a stored procedure against it. However, to be able to do something like a count stored procedure it has to work across all partitions or be executed for each partition.

This means a stored procedure can't be executed to produce a list of partition keys (AFAIK). Running a query to return every value of partitionkey in all the documents, then deduplicating it client side is a highly inelegant, brute force, expensive operation. You cannot SELECT DISTINCT.

How can one produce a list of partition keys, without resorting to the SELECT partitionkey FROM docs approach?

1

1 Answers

2
votes

The most efficient way I know of to get an exact count on a partitioned collection is to use a query like this:

SELECT VALUE 1 FROM docs

Then count/sum the results that are returned or better yet, read the x-ms-item-count header.

I haven't tried it but if you want an approximate count of documents, I think you could use the x-ms-resource-usage that comes back from any request. The documentsSize value divided by the average document size (which you'd have to calculate somehow, maybe sampling?) would give you an approximate count. I'm assuming that those headers are available and accurate for partitioned collections though, so you may have to try this to be sure.

I don't know of one, but if there is some way to retrieve the current partition config (table showing what range of hash key values go to what partitions), then it might be possible to do a manual fan-out of sproc calls. However, if their implementation is anything like mine, operations during a re-balancing (when partitions are added or removed) are tricky and you'd have to guard against the config changing between when it was retrieved compared to when you faned out the sproc calls. I'd rather my design not rely upon knowing the current partition config.