3
votes

Please just help me clarify this.

  • When processing events from an Azure Event Hub, and creating a EventProcessorHost using a ConsumerGroupName X. If I do this on multiple processes (not threads), and I create multiple EventProcessorHost all with ConsumerGroupName X. Will they then read from the same partition and thus get the same event multiple times (in a racing condition situation)?
  • When processing events using a ConsumerGroupName X and another ConsumerGroupName Y, do both ConsumerGroups get all the events or will they each only get events from a certain partition?
  • When processing events in the ProcessEventsAsync of an IEventProcessor. What does await context.CheckpointAsync(); actually do? Does it set the Checkpoint only for the ConsumerGroup or is that a global setting for the EventHub so that those events will never be looked at again. Is the context here a leased partition?

EDIT: OK so I've made some progress (correct me if I'm wrong):

  1. Each Consumer will get all the messages.
  2. Leases are assigned an EventProcessorHost so it needs a unique name, so the consumer group name is not really relevant here.
  3. Still not 100% certain on context.CheckpointAsync, but I believe it is just for the ConsumerGroup?
1
This may be partly duplicated in stackoverflow.com/questions/27789320/…Daniel van Heerden

1 Answers

3
votes

Yep, if you give multiple EventProcessorHosts the same Consumer Group name then they will coordinate (assuming you've given them different unique identifiers) using blob leases, so only one will work on a partition at a time. Typically you would have multiple processes on multiple machines so as to parallelize the work. Partitions can and will move between machines as the processes restart (there's a delay).

If you use different consumer group names X and Y, then processors on X will only coordinate with processors on X, and processors on Y will coordinate only with those on Y. You can use the same name on two different processors if each is in a different consumer group. That is you can have EventProcessorHost "one" in X and another EventProcessorHost "two" in Y and they shouldn't interfere.

When checkpointing progress it is indeed just for within that ConsumerGroup. As I mentioned here I believe the offset is being tracked inside the blob used for the leases for coordination. As such each ConsumerGroup can checkpoint without knowing anything about the other (but probably shouldn't checkpoint with every message).