Kafka Streams processor taking long time to consume changelog topics and initialize state stores

Question

I'm working on a stream processor which has KStream-KStream and KStream-KTable join and also uses a state store remove the duplicates while doing the join.

We have been performing load tests for this processor and the messages in the topic are growing, which is causing the stream processor to take long time (~1 hour) to consume the changelog topics and initialize the state stores when there's a restart/redeployment happens.

We have a retention of 7 days for the topics.

This is more a description of your observation than a question? What do you want to know? Are you aware of StandbyTasks? What version do you use? And please, ask a question :) — Matthias J. Sax

Edmondo1984 Edmondo1984 · Accepted Answer · 2018-05-31T18:53:57

There are multiple reasons for which this happens:

Your broker performance, i.e. how much data your KStream app can pull from each broker
Your KStream performance
Your serialization format (if you use Avro, the data size will be way smaller)

The solution to avoid expensive restarts is to have a persistent local state store. For example, you can map the default state store folder (/tmp/kafka-streams) to some sort of persistent volume

Kafka Streams processor taking long time to consume changelog topics and initialize state stores

1 Answers