How do KTables get their initial values?

Question

I've recently been looking into the Kafka Streams API and I'm having a little trouble fulling understanding KTables. I think I understand the general concepts but I'm struggling with a few of the details.

In my example application, I fetch a bunch of prices and then use the Kafka Streaming API to produce average prices for each product to a compacted Kafka topic (Topic-A). I have a second service that I want to react to these averaged price updates. So in that second service, I create a KTable over Topic-A, and I can query its store successfully.

My goal is to have this second service process & react to these averaged prices in real-time, but also to have access to the latest value for each product on demand. I believe I can use a KTable and Store to do this.

Initially, I believed:

The KTable is backed by a local store (RocksDB instance)
When the KTable is initialized, it consumes the entire of Topic-A to build its KTable

However, it seems as though KTables are (or can be?) backed by a compacted change-log.

Does this mean that upon initialization, the KTable only needs to consume the latest record for each key?
If I run multiple instances of my second service do the KTables share a change log? I imagine if the number of instances was scaled up/down, instances would need to update their local state to account for data from more/less partitions.
Would using a GlobalKTable give me all the K/V pairs available in each instance?

Dmitry Minkovsky Dmitry Minkovsky · Accepted Answer · 2017-12-12T16:27:36

Does this mean that upon initialization, the KTable only needs to consume the latest record for each key?

Yep. If data in the underlying topic is such that each value represents a complete latest value for that key, then the topic can be configured with cleanup.policy=compact and Kafka Streams only needs to read the latest value to restore the KTable (which is a RocksDB store). In terms of data modeling, this is the only kind of data/topic you want/makes sense to use as input for a KTable.

If I run multiple instances of my second service do the KTables share a change log?

Yes, they read from the same changelog topic but they generate their own RocksDB stores based on the state.dir param you provide in the Kafka Streams configuration.

Would using a GlobalKTable give me all the K/V pairs available in each instance?

Yes, but GlobalKTables are slightly more limited in what you can do with them than regular KTables. I believe the new 1.0.0 release has added functionality to GlobalKTables, but they still have some limitations.

How do KTables get their initial values?

1 Answers