0
votes

Kafka messaging use at-least-once message delivery to ensure every message to be processed, and uses a message offset to indicates which message is to deliver next.

When there are multiple consumers, if some deadly message cause a consumer crash during message processing, will this message be redelivered to other consumers and spread the death? If some slow message blocked a single consumer, can other consumers keep going and process subsequent messages? Or even worse, if a slow and deadly message caused a consumer crash, will it cause other consumers start from its offset again?

1

1 Answers

3
votes

There are a few things to consider here:

  1. A Kafka topic partition can be consumed by one consumer in a consumer group at a time. So if two consumers belong to two different groups they can consume from the same partition simultaneously.
  2. Stored offsets are per consumer group. So each topic partition has a stored offset for each active (or recently active) consumer group with consumer(s) subscribed to that partition.
  3. Offsets can be auto-committed at certain intervals, or manually committed (by the consumer application).

So let's look at the scenarios you described.

  • Some deadly message causes a consumer crash during message processing
    • If offsets are auto-committed, chances are by the time the processing of the message fails and crashes the consumer, the offset is already committed and the next consumer in the group that takes over would not see that message anymore.
    • If offsets are manually committed after processing is done, then the offset of that message will not be committed (for simplicity, I am assuming one message is read and processed at a time, but this can be easily generalized) because of the consumer crash. So any other consumer in the group that is (will be) subscribed to that topic will read the message again after taking over that partition. So it's possible that it will crash other consumers too. If offsets are committed before message processing, then the next consumers won't see the message because the offset is already committed when the first consumer crashed.
  • Some slow message blocks a single consumer: As long as the consumer is considered alive no other consumer in the group will take over. If the slowness goes beyond the consumer's session.timeout.ms the consumer will be considered dead and removed from the group. So whether another consumer in the group will read that message depends on how/when the offset is committed.
  • Slow and deadly message causes a consumer crash: This scenario should be similar to the previous ones in terms of how Kafka handles it. Either slowness is detected first or the crash occurs first. Again the main thing is how/when the offset is committed.

I hope that helps with your questions.