0
votes

I have setup the Kafka Spout for the Storm Pipeline. I don't want to read the data neither from the latest offset nor from the beginning. Is there any way to read the offset stored in zookeeper from the configurable offset. Storm provides us ways to read from the latest offset or from the beginning. I do not want that case.

Use Case : Offset 0 deployed topology.
Offset 50 changed a topology
Offset 100 detect that recent topology has a bug. Want to start from offset 50.
How can i achieve the same.?

1

1 Answers

0
votes

KafkaSpout will read last committed offset from zookeeper. If there is no offset in the zookeeper, it will use configured startOffsetTime. The default configuration of KafkaSpout is following.

public long startOffsetTime = kafka.api.OffsetRequest.EarliestTime();

If you change the value of startOffsetTime and set KafkaConfig.ignoreZkOffsets = true, I think you can make the consumer start from the specific offset.
If ignoreZkOffsets equals true, the spout will always begin reading from the offset defined by KafkaConfig.startOffsetTime as described above.

Also, have a look on this article. How do I accurately get offsets of messages for a certain timestamp using OffsetRequest?

Reference