1
votes

I am using below stored procedure to delete items from cosmos db collection.

function bulkDeleteStoredProcedure(query) {
    var collection = getContext().getCollection();
    var collectionLink = collection.getSelfLink();
    var response = getContext().getResponse();
    var responseBody = {
        deleted: 0,
        continuation: true
    };

    // Validate input.
    if (!query) throw new Error("The query is undefined or null.");

    tryQueryAndDelete();

    // Recursively runs the query w/ support for continuation tokens.
    // Calls tryDelete(documents) as soon as the query returns documents.
    function tryQueryAndDelete(continuation) {
        var requestOptions = {continuation: continuation};

        var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
            if (err) throw err;

            if (retrievedDocs.length > 0) {
                // Begin deleting documents as soon as documents are returned form the query results.
                // tryDelete() resumes querying after deleting; no need to page through continuation tokens.
                //  - this is to prioritize writes over reads given timeout constraints.
                tryDelete(retrievedDocs);
            } else if (responseOptions.continuation) {
                // Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
                tryQueryAndDelete(responseOptions.continuation);
            } else {
                // Else if there are no more documents and no continuation token - we are finished deleting documents.
                responseBody.continuation = false;
                response.setBody(responseBody);
            }
        });

        // If we hit execution bounds - return continuation: true.
        if (!isAccepted) {
            response.setBody(responseBody);
        }
    }

    // Recursively deletes documents passed in as an array argument.
    // Attempts to query for more on empty array.
    function tryDelete(documents) {
        if (documents.length > 0) {
            // Delete the first document in the array.
            var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) {
                if (err) throw err;

                responseBody.deleted++;
                documents.shift();
                // Delete the next document in the array.
                tryDelete(documents);
            });

            // If we hit execution bounds - return continuation: true.
            if (!isAccepted) {
                response.setBody(responseBody);
            }
        } else {
            // If the document array is empty, query for more documents.
            tryQueryAndDelete();
        }
    }
}

While executing this store procedure I am getting following error:

Failed to execute stored procedure BulkDelete for container Notifications: {"code":429,"body":{"code":"429","message":"Message: {\"Errors\":[\"Request rate is large. More Request Units may be needed, so no changes were made. Please retry this request later. Learn more: http://aka.ms/cosmosdb-error-429\"]}\r\nActivityId: cc616784-03ee-4b10-9481-d62c26e496e4, Request URI: /apps/2268c937-d7b4-449e-9d76-a2d50d5d3546/services/df84607d-8553-4938-aa0d-913563078a93/partitions/b37017a9-ab2c-4a88-bb51-0ae729299a7e/replicas/132314907336368334p/, RequestStats: \r\nRequestStartTime: 2020-05-20T07:55:16.8899325Z, RequestEndTime: 2020-05-20T07:55:17.5299234Z, Number of regions attempted:1\r\nResponseTime: 2020-05-20T07:55:17.5299234Z, StoreResult: StorePhysicalAddress: rntbd://cdb-ms-prod-northeurope1-fd25.documents.azure.com:14307/apps/2268c937-d7b4-449e-9d76-a2d50d5d3546/services/df84607d-8553-4938-aa0d-913563078a93/partitions/b37017a9-ab2c-4a88-bb51-0ae729299a7e/replicas/132314907336368334p/, LSN: 400340, GlobalCommittedLsn: 400339, PartitionKeyRangeId: , IsValid: True, StatusCode: 429, SubStatusCode: 3200, RequestCharge: 0.38, ItemLSN: -1, SessionToken: , UsingLocalLSN: False, TransportException: null, ResourceType: StoredProcedure, OperationType: ExecuteJavaScript\r\n, SDK: Microsoft.Azure.Documents.Common/2.11.0"},"headers":{"access-control-allow-credentials":"true","access-control-allow-origin":"https://cosmos.azure.com","content-type":"application/json","lsn":"400340","strict-transport-security":"max-age=31536000","x-ms-activity-id":"cc616784-03ee-4b10-9481-d62c26e496e4","x-ms-cosmos-llsn":"400340","x-ms-cosmos-quorum-acked-llsn":"400340","x-ms-current-replica-set-size":"4","x-ms-current-write-quorum":"3","x-ms-gatewayversion":"version=2.11.0","x-ms-global-committed-lsn":"400339","x-ms-number-of-read-regions":"1","x-ms-quorum-acked-lsn":"400340","x-ms-request-charge":"0.38","x-ms-retry-after-ms":"8538","x-ms-schemaversion":"1.9","x-ms-serviceversion":"version=2.11.0.0","x-ms-substatus":"3200","x-ms-transport-request-id":"120","x-ms-xp-role":"1","x-ms-throttle-retry-count":5,"x-ms-throttle-retry-wait-time-ms":32087},"activityId":"cc616784-03ee-4b10-9481-d62c26e496e4","substatus":3200,"retryAfterInMs":8538}

How Can I fix this issue? Is something wrong with the stored procedure?

3
What is your RU set to? If it's too low for what you're attempting to do, you'll see throttling. You're seeing an 8-second backoff. That likely means you keep sending requests to Cosmos DB (some combination of queries and deletes), which is causing your backoff time to keep growing (and growing beyond the 5-second execution time for a stored procedure).David Makogon
@DavidMakogon Throughput RU set to 1000Rakesh Kumar

3 Answers

4
votes

CosmosDB returns 429 when the current aggregate RU's + the RU for your query will exceed the threshold value you have set. For example, if your threshold is 400 and so far you have used 380 RU's and the next query needs 22 RU's to complete, cosmos will reject the query with code 429. If the next query only needs 3 RU's, it will succeed. 1 Second later, the cumulative RU value is reset to zero and the 22 RU query will have succeed.

If you get a 429 you will also receive a "x-ms-retry-after-ms" header which will contain a number. You should wait that number of milliseconds before retrying your query.

https://docs.microsoft.com/en-us/rest/api/cosmos-db/common-cosmosdb-rest-response-headers

Alternatively, you can avoid the 429 by increasing the threshold ( this will also increase the cost of the service). So you have to decide if you prefer to retry or increase the threshold. This depends on the nature of your application.

RU or resource units are calculated by the CosmosDB service based on the amount work the service needs to do. It's a combination of how large your index is, how much data is being transferred, how much CPU, disk, memory you use, etc... Charging for RU's is a way for Cosmos to understand the workload you are going to be running and to make the necessary backend changes as needed. The per second cost for cosmos is based on your RU threshold setting. It also allows cosmos to make the necessary changes in the back end to accommodate your performance needs. The RU calculation gets more complex if you are reading and writing from different regions around the world.

You can reduce the RU cost of your queries by restructuring your data in your index. If you are fanning out to multiple partitions, the queries will run parallelly which does more work in a shorter time period. If you reduce or increase the kilo bytes that move across the network, memory and cpu components, it will also change the RU's.

3
votes

429 error is caused by too many request,not your stored procedure is wrong.

But stored procedure is best suited for write-heavy operations,not read or delete heavy.Instead,you can use Bulk Executor Lib SDK,there is BulkDelete feature.

Here is the document.

0
votes

The easiest way would be to increase the Throughput in the azure portal, Depends on how frequent you gonna perform such an action, you may increase the Throughput and perform your operation and reset it -if it is one off- or find your optimum Throughput. You should play around to find the number but if it is one off try something like 1000 and multiply to 10 on each attempt until it works and do not forget to reset the value back to what it was, otherwise you'll get a big bill :)
enter image description here (you may also consider to auto scale it if your throughput is set to more or around 4000 and gain some cost advantage)