Cost of Kstream Vs cost of KTable with respect to the state store

Question

I am trying to better understand how to set up my cluster for running my Kafka-Stream application. I'm trying to have a better sense of the volume of data that will be involve.

In that regard, while I can quickly see that a KTable require a state store, I wonder if creating a Kstream from a topics, immediately means copping all the log of that topic into the state store obviously in an append only fashion I suppose. That is, especially if we want to expose the stream for query ?

Does Kafka automatically replicate the Data in the state store as they move in the source topic, when it is a Kstream ? As said above this sounds obvious for Ktable because of the update, but for Kstream I just want a confirmation of what happens.

Nishu Tayal Nishu Tayal · Accepted Answer · 2019-07-09T08:13:54

State Stores are created whenever any stateful operation is called or while windowing stream.

You are right that KTable requires a state store. KTable is an abstraction of changelog stream where each record represents an update. Internally it is implemented using RocksDB where all the updated values are stored in the state store and a changelog topic. At any time, state store can be rebuilt from changelog topic.

While KStream has a different concept, it represents abstraction on record stream with the unbounded dataset in append-only format. It doesn't create any state store while reading a source topic.

Unless, you want to see the updated changelog, it is okay to use KStream instead of KTable as it avoids creating unwanted state store. KTables are always expensive as compared to KStreams. Also it depends on how you want to use the data.

If you want to expose the stream for query, you need to materialize the stream into state store.

Cost of Kstream Vs cost of KTable with respect to the state store

1 Answers