I am trying to come up with a design using Kafka for a number of processing agents to process messages from a Kafka topic in parallel.
I would like to ensure close to exactly-once per message processing across the whole consumer group, although can tolerate at-least-once.
I find the documentation unclear in many regards, and there are a few specific questions I have to know if this is a viable approach:
- if a message is published to a topic, does it exist once only across all partitions in the topic or is it replicated on possibly more than one partition? I have read statements that could support both possibilities.
- is the "offset" per partition or per consumer/consumergroup/partition?
- when I start a new consumer, does it look at the offset for the consumer group as a whole or for the partition it is assigned?
- if I want to scale up new consumers and there are no free partitions (I believe there can be not more than one consumer per partition), will kafka rebalance existing messages from the existing partitions, and how does that affect the offsets and consumers of existing partitions?
Or are there any other points I am missing that may help my understanding of this?