I've recently been looking into the Kafka Streams API and I'm having a little trouble fulling understanding KTables. I think I understand the general concepts but I'm struggling with a few of the details.
In my example application, I fetch a bunch of prices and then use the Kafka Streaming API to produce average prices for each product to a compacted Kafka topic (Topic-A). I have a second service that I want to react to these averaged price updates. So in that second service, I create a KTable over Topic-A, and I can query its store successfully.
My goal is to have this second service process & react to these averaged prices in real-time, but also to have access to the latest value for each product on demand. I believe I can use a KTable and Store to do this.
Initially, I believed:
- The KTable is backed by a local store (RocksDB instance)
- When the KTable is initialized, it consumes the entire of Topic-A to build its KTable
However, it seems as though KTables are (or can be?) backed by a compacted change-log.
Does this mean that upon initialization, the KTable only needs to consume the latest record for each key?
If I run multiple instances of my second service do the KTables share a change log? I imagine if the number of instances was scaled up/down, instances would need to update their local state to account for data from more/less partitions.
Would using a GlobalKTable give me all the K/V pairs available in each instance?