We are currently looking at Azure Event Hubs as a mechanism to dispatch messages to background processors. At the moment the queue-based systems are being used.
Most processors are writing data to SQL Server databases, and the writes are wrapped in transactions.
Event Hubs are positioned as at-least-once communication channel, so duplicate messages should be expected. EventProcessorHost
is the recommended API on the read side, which automates lease management and checkpointing using Azure Blob Storage.
But we have an idea, for some most critical processors, to implement checkpointing ourselves using a SQL Server table inside the same database, and write the checkpoint inside the same transaction of the processor. This should give us the strong guarantee of exactly once delivery when needed.
Ignoring lease management for now (just run 1 processor per partition), is SQL-based checkpointing a good idea? Are there other drawbacks, except the need to work on lower level of API and handle checkpoints ourselves?