1
votes

I'm using the new SDK to do a "Bulk" delete.

new CosmosClientOptions() { AllowBulkExecution = true }

And I want to delete anything where a specific date in the document is older than 3 years; i.e.,

Select  c.id,
        c.nameofPartitionKeyField 
from    c 
where   c.InvoiceDate < 3 years ago --???

I can't use Time To Live here because all of the documents were loaded a year ago for the last three year. But now, after a year, we want to delete items that are four years old.

I am querying first to get the id and partition key needed:

tasks.Add(container.DeleteItemStreamAsync(item.id.ToString(), new PartitionKey(item.pk.ToString()))

and then

await Task.WhenAll(tasks);

But no matter what I do, if I pick more than 20 or 30 records, I get 429 too many requests. This doesn't seem like "BulK" if I can only do 20 records at a time. But I must be doing something incorrectly. I’m following all of the examples I've seen, yet no one ever provides a delete example, only insert. In any case, it shouldn't be much different.

I do have a stored procedure that works with continuation and will keep deleting items, but it seems slow. I need to delete about 6 million documents. This doesn't seem to do it or do it quickly.

2

2 Answers

1
votes

There's this note from the docs:

In cases where the provisioned request units is much lower than the expected based on the amount of data, you might want to consider setting these to high values. The bulk operation will take longer but it has a higher chance of completely succeeding due to the higher retries.

So it sounds like a complete solution would be:

  1. Set MaxRetryAttemptsOnRateLimitedRequests to a high value.
  2. Use SemaphoreSlim or similar to restrict your batch size on the client side.

Alternatively, you could set a short TTL on each and let CosmosDb delete them automagically using your "leftover" RUs.

1
votes

You can try the Bulk Executor Lib SDK.It has BulkDelete feature.

Here is the document.

Here is the sample from GitHub.

Hope it can help you.