2
votes

I've created a table in ksql from kafka topic. Pushed a set of data to topic and table's populated. Posted a query and got the response too. For the 2nd time I've pushed the same data into the topic and the table gets loaded again. Now, when I query, response is 2 rows instead of 1 with 2 different ROWTIME timestamps.

I believe ksql table should overwrite the value if the same key comes in and retain the latest value. But that's not happening. Is my understanding correct?

What should be done for the table to keep the latest value and discard previous value on same key is inserted/updated? Thanks

1

1 Answers

1
votes

As far as I know it is not possible to apply a log compaction policy in order to keep exactly one message per key. Even if you set cleanup.policy=compact (topic-level) or log.cleanup.policy=compact (global level), there is no guarantee that only the latest message will be kept and older ones will be compacted.

According to the official Kafka documentation:

Log compaction gives us a more granular retention mechanism so that we are guaranteed to retain at least the last update for each primary key