1
votes

I am using single node Kafka broker (v 0.10.2) Hardware (8 core, 16 GB RAM, 1 TD HardDisk) and zookeeper (v 3.4.8). I have a topic with 200 partition in which messages contains the total of 3 Million messages. It took 5 days to completely process all the messages and as soon as message got processed i.e. Kafka-consumer-groups.sh showed 0 lag in all the partition of the topic I stopped the consumer .but after 6 hrs again it was showing the lag of 2 million message which I found that were duplicate messages. This thing is happening very frequently. My offsets are stored on Kafka broker itself. My server configuration is:

broker.id=1
delete.topic.enable=true
#listeners=PLAINTEXT://:9092
#advertised.listeners=PLAINTEXT://your.host.name:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/kafka/data/logs
num.partitions=1
num.recovery.threads.per.data.dir=5
log.flush.interval.messages=10000
#log.flush.interval.ms=1000
log.retention.hours=480
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=<zkIp>:2181
zookeeper.connection.timeout.ms=6000

Is there in the configuration that I am missing? Any help is appreciated

1
It's the consumer apps responsibility to commit offset. How is your consumer configured and are you sure it's committing offsets at regular intervals as it takes 5 days to finish processing?Hans Jespersen
@HansJespersen Yes, I am manually committing offset after every message.and when lag is 0 I m stopping my consumers.If you want I can share consumer config alsoAbhimanyu
That 2-million lag happens in only several partitions or across all the 200 partitions?amethystic
@amethystic its the combined lag of all the 200 partitionsAbhimanyu
If you run consumers just long enough to process a few messages and then shut them down cleanly, do they continue from where they left off when they restart?Hans Jespersen

1 Answers

1
votes

The issue was that since offsets.retention.minutes was 1440 (1 day) therefore offsets in __consumer_offsets topic got wiped out after the configured time and so when the consumer restarted, it did not find where to start with and my auto.offset.reset is set to earliest on consumers, so my messages were getting reconsumed. Setting offsets.retention.minutes to 143200 .Solved the issue.