4
votes

I have a stored procedure which gives me a document count (count.js on github). I have partitioned my collection. Due to this, I now have to pass the partition key in as an option to run the stored procedure.

Can and how should I enable crosspartition queries in the stored procedure (ie, collection(EnableCrossPartitionQuery = true)) so that I don't have to specify the partition key?

2

2 Answers

3
votes

There is no way to do fan-out stored procedure execution in DocumentDB. The run against a single partition. I ran into this dilemma when trying to switch to partitioned collections and had to make some adjustments. Here are some options:

  1. Download a 1 for every record and sum/count them client-side

  2. Rerun the stored procedure for each unique partition key. In my case, this was not as bad as it sounds since the partition key is a tenantID and I only have a dozen of those and only expect a few hundred max.

  3. I'm not sure about this one since I haven't tried it with partitioned collections, but each query now returns the resource usage of the collection in the x-ms-resource-usage header. That header has a documentsSize sub-header. You could use that divided by the average size of your documents to get an approximate count. There may even be a count record in that header information by now.

  4. Also, there is an x-ms-item-count header but I'm not sure how that behaves. If you send a query for all the records in the entire partitioned collection and set the max-item-count to 1, you'll only get back one record and it shouldn't cost you a lot in RUs, but I don't know how that header behaves. Does it return a 1 in that case? Or does it return the total number of documents all the pages of the query would eventually return if you bothered to request every page. A quick experiment should confirm this.

0
votes

Below you can find some example code that should allow you to read all records cross partion. The magic is inside the doForAll function, and at the top you can see how it is called.

// SAMPLE STORED PROCEDURE
function sample(prefix) {

    var share = { counter: 0, hasEntityName : 0, isXXX: 0, partitions: {}, prefix };

    doForAll({
        filter: function limiter(record){
            if (record && record.entityName === 'XXX') return true;
            else return false;
        },
        callback: function handleRecord(record) {
            //Keep track of this partition...
            let partitionKey = record.partitionKey;
            if (share.partitions[partitionKey])
                share.partitions[partitionKey]++;
            else 
                share.partitions[partitionKey] = 1;

            //update some counters...
            share.counter++;
            if (record.entityName !== undefined) share.hasEntityName++;
            if (record.entityName === 'XXX') share.isXXX++;
        },
        finaly: function whenAllIsDone() {
            console.log("counter = " + share.counter + ". ");
            console.log("has entity name: "+ share.hasEntityName+ ". ")
            console.log("is XXX: " + share.isXXX+ ". ")
            var parts = Object.getOwnPropertyNames(share.partitions)
            console.log("partition keys: " + parts.length + " ...");

            getContext()
                .getResponse()
                .setBody(share);
        }
    });


    //The magic function...
    //also see: https://azure.github.io/azure-cosmosdb-js-server/Collection.html
    function doForAll(task, ctoken) {

        if (!task) throw "Expected one parameter of type: { filter?: (rec?)=>boolean, callback?: (rec?) => void, finaly?: () => void }";
        //Note:
        //the "__" symbol is an alias for var collection = getContext().getCollection(); = aliased by __

        var result = getContext()
            .getCollection()
            .chain()
                .filter(task.filter || function (rec) { return true; })
                .map(task.callback || function (rec) { return undefined; })
            .value({ continuation: ctoken }, function afterBatchCallback (err, feed, options) {
                if (err) throw err;
                if (options.continuation)
                    doForAll(task, options.continuation);
                else if (task.finaly)
                    task.finaly();
            });

        if (!result.isAccepted)
            throw "catastrophic failure";
    }

}

PS: it may to know how the data looks like that is used for the example. This is an example of such a document:

{ 
    "id": "123",
    "partitionKey": "PART_1",
    "entityName": "EXAMPLE_ENTITY",
    "veryInterestingInfo": "The 'id' property is also the collections id, the 'partitionKey' property happens to be the collections partition key, and all the records in this collection have a 'entityName' property which contains a (non-unique) string"
}