2
votes

I've been having all sorts of instabilities related to Kafka and offsets. Things like workers crashing on startup with exceptions related to invalidate offsets, and other things I don't understand.

I read that it is recommended to migrate offsets to be stored in Kafka instead of Zookeeper. I found the below in the Kafka documentation:

Migrating offsets from ZooKeeper to Kafka Kafka consumers in earlier releases store their offsets by default in ZooKeeper. It is possible to migrate these consumers to commit offsets into Kafka by following these steps: 1. Set offsets.storage=kafka and dual.commit.enabled=true in your consumer config. 2. Do a rolling bounce of your consumers and then verify that your consumers are healthy. 3. Set dual.commit.enabled=false in your consumer config. 4. Do a rolling bounce of your consumers and then verify that your consumers are healthy.

A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also be performed using the above steps if you set offsets.storage=zookeeper.

http://kafka.apache.org/documentation.html#offsetmigration

But, again, I don't understand what this is instructing me to do. I don't see anywhere in my topology config where I configure where offsets are stored. Is it buried in the cluster yaml?

Any advice on if storing offsets in Kafka, rather than Zookeeper, is a good idea? And how I can perform this change?

1

1 Answers

4
votes

At the time of this writing Storm's Kafka spout (see documentation/README at https://github.com/apache/storm/tree/master/external/storm-kafka) only supports managing consumer offsets in ZooKeeper. That is, all current Storm versions (up to 0.9.x and including 0.10.0 Beta) still rely on ZooKeeper for storing such offsets. Hence you should not perform the ZK->Kafka offset migration you referenced above because Storm isn't compatible yet.

You will need to wait until the Storm project -- specifically, its Kafka spout -- supports managing consumer offsets via Kafka (instead of ZooKeeper). And yes, in general it is better to store consumer offsets in Kafka rather than ZooKeeper, but alas Storm isn't there yet.

Update November 2016: The situation in Storm has improved in the meantime. There's now a new, second Kafka spout that is based on Kafka's new 0.10 consumer client, which stores consumer offsets in Kafka (and not in ZooKeeper): https://github.com/apache/storm/tree/master/external/storm-kafka-client.

However, at the time I am writing this, there are still several issues being reported by the users in the storm-user mailing list (such as Urgent help! kafka-spout stops fetching data after running for a while), so I'd use this new Kafka spout with care, and only after thorough testing.