I've been using Kafka for some months now, and I realized that some of the core concepts are not so clear for me yet. My doubt is related to the relation between consumerId, groupId and offsets. In our application we need Kafka to work using the publish - subscribe paradigm, so we use diffent group ids for each consumer, which are randomly generated.
I used to think that setting auto.offset.reset = latest
my consumers will always receive the messages they have not received yet, but lately I learned that is not the case. That only works if the consumer has not offsets committed yet. In any other case, the consumer will continue receiving messages with offset greater than the last offset it committed.
Since I always create new consumers with random group ids, I realized that my consumers have "no memory", they are new consumers and they will never have offsets committed, so the auto.offset.reset = latest
policy will always apply. And here is where my doubts start. Suppose the following scenario:
- I have two client applications, A and B, with one consumer each, working in the publish - subscribe way (thus, with different groups ids). Both consumers are subscribe to the topic
my-topic
.auto.offset.reset
setting islatest
for both consumers. - Some producer (or producers) publish messages M1, M2 and M3 to topic
my-topic
. - Both A and B receive M1, M2 and M3.
- Now I shutdown application B.
- Producers produce messages M4 and M5.
- Application A receives messages M4 and M5.
- Now I restart application B. Remember,
groupId
is random, and I am not setting any consumer id, so that means this is a new consumer (right?). Application B does not receive any message. - Producers publish messages M6 and M7.
- Both applications A and B receive messages M6 and M7.
So, summarizing, if I am not wrong, A receives all messages but B has missed M4 and M5. I've tried this with kafka-console-consumer.sh
and it behaves this way.
So, how can I make application B receive the messages published while it was shut down? I now if I start it assigning the same groupId as when it was originally started, it will read messages M4 and M5, but that is setting the group id. Is it possible to set the consumer id too, and get the same behaviour?
Or put another way, what is understood by starting the same consumer again? Two consumers are the same consumer if they have the same groupId, the same consumerId, both?
By the way, consumerId and the property client.id are the same?