2
votes

I am new to Kafka but I understand kafka stores consumer offsets in __consumer_offsets topic and offsets.retention.minutes defines after this time consumer offset will get deleted. But for me __consumer_offsets topic has clean.policy=compact. see below result when i decsribed __consumer_offsets topic:

Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:1 Configs:compression.type=producer,cleanup.policy=compact,segment.bytes=104857600

So now the question is after 2 minutes consumer offset will get deleted from the topic or will still be there in a topic in compacted form?

1
@mike My question is different from Kafka log compacting not starting. My question is will consumer offset will get deleted due to offsets.retention.minutes defines or will get compacted due to cleanup.policy=compact for __consumer_offsets topic.Anshita Singh

1 Answers

2
votes

(some brief context about how consumer offsets are stored in kafka)

First thing to know is that offsets for consumer are stored in the __consumer_offsets topic in below message format.

Key : Value (approx format)

(consumer-group, topic-name, parition) : (....offset,.... commit-timestamp, expire-timestamp)

Since __consumer_offset is a topic, there is no provision to update the existing records, so each update is persisted as a new record.

(Now coming back to your question)

Setting offsets.retention.minutes=2 will reset the offset for each consumer to NULL if there is no active consumer on your topic for this period of time. It does not delete it just resets the offset to NULL. So now your __consumer_offset topic will be updated with a new record as below :

Key -> (consumer-group, topic-name, parition) : Value -> NULL

But older records for this same Key would still be there. But if your consumer becomes alive again by a restart now, it would see an invalid offset in the __consumer_offsets topic, so to decide which offset to read, there is a consumer property defined "auto.offset.reset" which you can set to "earliest" or "latest" as per your application logic.

Now as far as clean-up policy is concerned for this topic which is "compact", it indicates that only the latest record corresponding to a key would be maintained after the retention policy of the topic has expired which is determined by the property log.retention.min at cluster level. So only the latest committed offsets for each consumer would be left after the retention policy has expired, giving you the compacted form of the topic.