3
votes

When I run this command, I get 2 topics. I know I created the test topic but I see an additional topic called "__consumer_offsets". From the name it implies that it is related to consumer offsets, but how is it being used?

$ bin/kafka-topics.sh --list --zookeeper localhost:2181 __consumer_offsets test

$ bin/kafka-topics.sh --describe --zookeeper localhost:2181
Topic:__consumer_offsets        PartitionCount:50       ReplicationFactor:1     Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
        Topic: __consumer_offsets       Partition: 0    Leader: 0       Replicas: 0     Isr: 0
        Topic: __consumer_offsets       Partition: 1    Leader: 0       Replicas: 0     Isr: 0
                      *
                      *
                      *
        Topic: __consumer_offsets       Partition: 48    Leader: 0       Replicas: 0     Isr: 0
        Topic: __consumer_offsets       Partition: 49    Leader: 0       Replicas: 0     Isr: 0

This is happening in Kafka 1.1.0 and why there are 50 partitions. Also looking for a way to disable this because every time I try to run "describe" the topics, first it prints the 50 partitions of the __consumer_offsets and then prints my topics.

3
The link stackoverflow.com/questions/39529511/… partially addresses the question. I don't even see the _schema topic.humility

3 Answers

5
votes

In initial versions of Kafka, offset was being managed at zookeeper, but Kafka has continuously evolved over the time introducing lot of new features. Now Kafka manages the offset in an internal/system level topic i.e. __consumer_offsets.

Whenever you create a topic without specifying the number of partitions explicitly , Kafka ends up creating 50 partitions by default for that topic. Same implies to the topic __consumer_offsets.

1
votes

consumers store the last consumed message offset id in kafka topic __consumer_offsets based on the consumer group id.
This enables different consumers(obviously with different consumer id) to process the next message after the last consumed message and avoid duplicate message processing.

0
votes

The topic __consumer_offsets is used by consumers to store the offsets of message that's they read. It enable recovery when a consumer restart it will read the last position that it consume before the it went down et processing the next offset.

@cricket_007 was right, you can have duplicate by default in Kafka, it's the at least once semantics that's used.