38
votes

I'm a bit confused about where offsets are stored when using Kafka and Zookeeper. It seems like offsets in some cases are stored in Zookeeper, in other cases they are stored in Kafka.

What determines whether the offset is stored in Kafka or in Zookeeper? And what the pros and cons?

NB: Of course I could also store the offset on my own in some different data store but that is not part of the picture for this post.

Some more details about my setup:

  • I run these versions: KAFKA_VERSION="0.10.1.0", SCALA_VERSION="2.11"
  • I connect to Kafka/Zookeeper using kafka-node from my NodeJS application.
3

3 Answers

56
votes

Older versions of Kafka (pre 0.9) store offsets in ZK only, while newer version of Kafka, by default store offsets in an internal Kafka topic called __consumer_offsets (newer version might still commit to ZK though).

The advantage of committing offsets to the broker is, that the consumer does not depend on ZK and thus clients only need to talk to brokers which simplifies the overall architecture. Also, for large deployments with a lot of consumers, ZK can become a bottleneck while Kafka can handle this load easily (committing offsets is the same thing as writing to a topic and Kafka scales very well here -- in fact, by default __consumer_offsets is created with 50 partitions IIRC).

I am not familiar with NodeJS or kafka-node -- it depend on the client implementation how offsets are committed.

Long story short: if you use brokers 0.10.1.0 you could commit offsets to topic __consumer_offsets. But it depends on your client, if it implements this protocol.

In more detail, it depends on your broker and client version (and which consumer API you are using), because older clients can talk to newer brokers. First, you need to have broker and client version 0.9 or larger to be able to write offsets into the Kafka topics. But if an older client is connecting to a 0.9 broker, it will still commit offsets to ZK.

For Java consumers:

It depends what consumer are using: Before 0.9 there are two "old consumer" namely "high level consumer" and "low level consumer". Both, commit offsets directly to ZK. Since 0.9, both consumers got merged into single consumer, called "new consumer" (it basically unifies low level and high level API of both old consumers -- this means, in 0.9 there a three types of consumers). The new consumer commits offset to the brokers (ie, the internal Kafka topic)

To make upgrading easier, there is also the possibility to "double commit" offsets using old consumer (as of 0.9). If you enable this via dual.commit.enabled, offsets are committed to ZK and the __consumer_offsets topic. This allows you to switch from old consumer API to new consumer API while moving you offsets from ZK to __consumer_offsets topic.

6
votes

It all depends on which consumer you're using. You should choose the right consumer based on your Kafka version.

for version 0.8 brokers use the HighLevelConsumer. The offsets for your groups are stored in zookeeper.

For brokers 0.9 and higher you should use the new ConsumerGroup. The offsets are stored with kafka brokers.

Keep in mind that HighLevelConsumer will still work with versions past 0.8 but they have been deprecated in 0.10.1 and support will probably go away soon. The ConsumerGroup has rolling migration options to help move from HighLevelConsumer if you were committed to using it.

1
votes

Offsets in Kafka are stored as messages in a separate topic named '__consumer_offsets' . Each consumer commits a message into the topic at periodic intervals in latest versions of kafka.