2
votes

What is the best way to feed data from Azure EventHub into CosmosDb for resilience and also some buffering to prevent overloading our CosmosDb request units (RU's) in busy periods. We want to append to an array in a document if the request exists, and create a new document if not.

There are numerous options, Serverless Functions, Streaming Analytics, or Cloud Services are some. We want to focus on resilience, buffering, and update a document array if exists, and cost.

2

2 Answers

2
votes

I would personaly go with the Azure Stream Analytics and apply a tumbling window analysis on the EventHub. Depending on the complexity of your data post processing, you could either egress directly into CosmosDB or egress into an Azure Function (https://azure.microsoft.com/en-us/blog/new-in-stream-analytics-output-to-azure-functions-built-in-anomaly-detection-etc/) and crunch the data over there and forward it to CosmosDB.

Should be the cheapest, most flexible and scalable solution...

0
votes

I agree with Sebastian87 here. To answer your question on Cosmos DB throughput provisioning - you will need to utilize some simple calculation in Azure Function before data gets ingested into Cosmos DB to figure what the throughput should be. Since Cosmos DB allows it to change any time for any collection separately (but still accounts for the maximum throughput provisioned within each hour) it makes sense to set it up any time you predict higher ingestion rate and down any time you expect (or observe) it to be lower.