44
votes

It seems like 'select count(*) from c' in the SQL queries allowed by documentdb in the azure site and through the documentdb explorer (https://studiodocumentdb.codeplex.com/) is not supported. To date, the only way to get a record count that I have found is from code (see below). However, there are enough files in our collection now that this is crashing. Is there a way to get a count on how many documents in a collection that works more than my solution?

DocumentClient dc = GetDocumentDbClient();
var databaseCount = dc.CreateDatabaseQuery().ToList();
Database azureDb = dc.CreateDatabaseQuery().Where(d => d.Id == Constants.WEATHER_UPDATES_DB_NAME).ToArray().FirstOrDefault();

var collectionCount = dc.CreateDocumentCollectionQuery(azureDb.SelfLink).ToList();

DocumentCollection update = dc.CreateDocumentCollectionQuery(azureDb.SelfLink).Where(c => c.Id == "WeatherUpdates").ToArray().FirstOrDefault();

var documentCount = dc.CreateDocumentQuery(update.SelfLink, "SELECT * FROM c").ToList();

MessageBox.Show("Databases: " + databaseCount.Count().ToString() + Environment.NewLine
                +"Collections: " + collectionCount.Count().ToString() + Environment.NewLine
                + "Documents: " + documentCount.Count().ToString() + Environment.NewLine, 
                 "Totals", MessageBoxButtons.OKCancel); 
8

8 Answers

119
votes

This is now possible in the year 2017 of our lord.

SELECT VALUE COUNT(1) FROM c

[ 1234 ]

28
votes

Actually works at this point:

SELECT COUNT(c.id) FROM c
11
votes

Until the implementation of the "count" keyword, you should do your query in a store procedure on the server. Take care to not get all columns/properties in your query if you want only a count.

Select only the id like;

  dc.CreateDocumentQuery(update.SelfLink, "SELECT c.id FROM c")
11
votes

This is possible in the same way you write SQL query now,

SELECT VALUE COUNT(1) FROM myCollection

enter image description here

NOTE: COUNT(1) won't work for a huge datasets.

You can read more about supported queries from here

9
votes

Just to recap - here is example of Count Stored Procedure via JS with continuation support.

And here is one more tool for DocumentDb that's pretty neat: https://github.com/mingaliu/DocumentDBStudio/releases

Upd Mar 2017: In the latest DDB SDK see DDB Aggregates press release there is full support for basic aggregates, without GROUP BY though (for now). Here is GIT REpo with examples: https://github.com/arramac/azure-documentdb-dotnet/tree/master/samples/code-samples/Queries

6
votes

I did a test against a partitioned Document db collection with 200K entities in a single partition. The Collection is configured with 10K RU/second.

Client side queries:

  1. "SELECT VALUE COUNT(1) FROM c"

Time elapsed (ms): 2471 milliseconds Total Request Units consumed: 6143.35

Note: This is the fastest and cheapest option. But keep in mind that you would need to handle continuation on the client side and execute next query using the returned continuation token otherwise you may get partial result/count.

  1. "SELECT COUNT(c.id) FROM c"

Time elapsed (ms): 2589 Total RU: 6682.43

Note: This is very close but slightly slower and more expensive.

Server side / Stored Procedure:

  1. If you need a stored proc, there is one provided here: https://github.com/Azure/azure-cosmosdb-js-server/blob/master/samples/stored-procedures/Count.js

But beware it is problematic.. It internally reads all documents in the collection / partition just to calculate the count. As a result it is much slower and a lot more expensive!

Time elapsed (ms): 8584 milliseconds Total RU: 13419.31

  1. I updated the stored procedure provided in above link to improve the performance. Full Updated Count.js below. The updated stored proc performs way faster and cheaper than the original and it is on par with the best performing client side query (#1 above):

Time elapsed (ms): 2534 milliseconds Total RU: 6298.36

function count(filterQuery, continuationToken) {
    var collection = getContext().getCollection();
    var maxResult = 500000; 
    var result = 0;

    var q = 'SELECT \'\' FROM root';
    if (!filterQuery) {
        filterQuery = q;
    }

    tryQuery(continuationToken);

    function tryQuery(nextContinuationToken) {
        var responseOptions = { continuation: nextContinuationToken, pageSize: maxResult };

        if (result >= maxResult || !query(responseOptions)) {
            setBody(nextContinuationToken);
        }
    }

    function query(responseOptions) {
        return (filterQuery && filterQuery.length) ?
            collection.queryDocuments(collection.getSelfLink(), filterQuery, responseOptions, onReadDocuments) :
            collection.readDocuments(collection.getSelfLink(), responseOptions, onReadDocuments);
    }

    function onReadDocuments(err, docFeed, responseOptions) {
        if (err) {
            throw 'Error while reading document: ' + err;
        }

        result += docFeed.length;

        if (responseOptions.continuation) {
            tryQuery(responseOptions.continuation);
        } else {
            setBody(null);
        }
    }

    function setBody(continuationToken) {
        var body = { count: result, continuationToken: continuationToken };
        getContext().getResponse().setBody(body);
    }
}
0
votes

Currently does not exist. I had a similar scenario and we ended up adding a counter to a document attribute that gets updated every time a document gets added or deleted. You could even make these two steps as part of a store procedure or a trigger if you want atomicity.

0
votes

My code count solution is also working...once I just selected the id as a Papa Ours pointed out :) To get my original post to work, replace this line:

var documentCount = dc.CreateDocumentQuery(update.SelfLink, "SELECT * FROM c").ToList();

with this line:

var documentCount = dc.CreateDocumentQuery(update.SelfLink, "SELECT id FROM c").ToList()

I still like the idea of the stored procedure as it will work in the documentdb studio (really cool project :)) - https://studiodocumentdb.codeplex.com/