I was given the following architecture that I'm trying to improve.
I receive a stream of DB changes which end up in a compacted topic. The stream is basically key/value pairs and the keyspace is large (~4 GB).
The topic is consumed by one kafka stream process that stores the data in RockDB (separate for each consumer/shard). The processor does two different things:
- join the data into another stream.
- check if a message from the topic is a new key or an update to an existing one. If it is an update it sends the old key/value and the new key/value pair to a different topic (updates are rare).
The construct has a couple of problems:
- The two different functionalities of the stream processor belong to different teams and should not be part of the same code base. They are put together to save memory. If we separate it we would have to duplicate RockDB's.
- I would prefer to use a normal KTable join instead of the handcrafted join that's currently in the code.
- RockDB seems to be a bit of overkill if the data is already persisted in a topic. We currently running into some performance issues and I assume it would be faster if we just keep everything in memory.
Question 1: Is there a way to hook into the compaction process of a compacted topic? I would like a notification (to a different topic) for every key that is actually compacted (including the old and new value). If this is somehow possible I could easily split the code bases apart and simplify the join.
Question 2: Any other idea on how this can be solved more elegantly?