0
votes

I'm building a durable function that periodically processes every record in my Cosmos DB (nights and weekends). Right now I only have a few hundred records, but once I get into production, I'm anticipating >50k documents to come in with ~1k per week being added.

I am getting the documents via an activityTrigger with the following bindings:

{
  "bindings": [
    {
      "name": "name",
      "type": "activityTrigger",
      "direction": "in"
    },
    {
      "type": "cosmosDB",
      "direction": "in",
      "name": "articles",
      "databaseName": "Arts",
      "collectionName": "ArtData",
      "connectionStringSetting": "CosmosTrigger_ConnectionString",
      "sqlQuery": "SELECT * FROM c WHERE c.type='article'"
    }
  ],
  "scriptFile": "../dist/GetAllArticleData/index.js"
}

Is there a limit to the total number of documents that are returned to an Azure Function via an SQL Query binding? Or does Azure Functions automatically handle the pagination and there is no upper limit?

If there is no pagination built in, what about chaining durable functions together, where the first activity gets the total row count, then a fan out/in query function is called with the OFFSET and LIMIT clauses being parameters passed in from the orchestrator? Is that a reliable pattern?

1
There's no real limit, you'll just need to keep passing the continuation token until you drain the query. But I have bigger concerns. If your partition key is type then if you keep adding data to this container you eventually will hit the 20GB limit. If type is not the parttition key then you are going to run a really expensive query that will continue to get more expensive over time. Also, why would you run this as a batch. Cosmos + Azure Functions is more suited for streaming updates. IT's more economical and scalable to update as you go rather than doing a batch once a week.Mark Brown
Thanks for the comment. Using the JavaScript bindings, I don't deal with the Cosmos context at all. I just get the documents as a property on the function context. I don't have access to the continuation token at all. I'm wondering if the Azure Function handles the continuation for me automatically. As for the partition key, it is something else, so no worries about the 20GB limit.joe_coolish
As for why batch? I'm using durable functions fan out/in to handle the batching (I have another function handling the change feed) to execute some business logic that is data order dependent. I don't control the order of the data coming in, so the best way to handle everything is to periodically run everything through the business logic. I need to process everything because data that comes in several weeks from now could be related to data processed last year for example. That's why I will run it at night/weekends. If it takes 8 hours to run, so be it. No one else is using the system.joe_coolish

1 Answers

0
votes

The Cosmos DB input binding drains the entire query results and passes them to your Function, no continuation is needed.

If the query has 10 results or 1000, it will drain the query and pass them as input.

Here is the code reference.