Design questions considering Kafka Streams and Spring Cloud Stream

Question

I need to maintain external systems records (KTables) and track any change on those records (KStreams).

The KTables will be requested by KSQL queries, while the KStreams will be handled by an event monitor.

Questions:

I need the KTable working like mirrors from the external systems. Will I have any problem if I decide to use this design regarding data storage? Data loss, expiration?
Using Spring, what is the best approach for the data type? Avro with a schema registry?
The source of everything is a Topic, right? So I will need to send messages to Topics, and my KTable and KStream would translate as needed. Is that right?
The KTable definitions are known, but I may have a group KStreams being created dynamic; what is the best way to achieve this?

I appreciate any comment that could help better design it.

sobychacko sobychacko · Accepted Answer · 2021-03-10T19:51:17

here are my suggestions/opinions on the questions, you might want to do further research into some of the core Kafka Streams related questions.

Not entirely clear what use-case/design you are proposing. The way I understood it, you have an external system (such as a database) and you want to extract that data as a key/value pair which could be translated into a KTable. In Kafka Streams, as you indicated in your question #3, the source of truth is the Kafka topic. Therefore, you need to bring the data from the external system into a Kafka topic first, and then materialize that as a KTable in Kafka Streams. There are established patterns such as the Change Data Capture (CDC) for exporting data from external systems to a Kafka topic in almost real-time. KTable can be materialized into state storage which is by default backed up RocksDB. The same information is also replicated by Kafka changelog topics and therefore applies the guarantees provided by data in a Kafka topic. I hope that someone from the Kafka Streams team can chime in on this specific topic for more information needed.
Spring Cloud Stream provides a binder for Kafka Streams using which you can establish bindings to Kafka topics through various Kafka Streams types such as KStream, KTable and GlobalKTable. See the reference docs for more details. The binder provides several convenient options for data types with Serde inference in the case of common data types. The question about Avro data types is really dependent on your use cases and how you want to manage the schema structure for the data. If centralized schema management is a concern, then avro is a good choice. You can use Confluent's schema registry for Avro with Spring Cloud Stream. Spring provides a schema registry, but for Kafka Streams workloads that require avro, we recommend using the Confluent schema registry as it has more features. Either way, it should work and we provide a number of sample applications demonstrating schema evolution here.
As I mentioned in the answer for #1, yes, the source of truth is Kafka topics and the Spring Cloud Stream binder provides binding mechanisms for connecting to Kafka topics and translate the data as KStream or KTable.
Here again, I am not following the actual use-case. However, Kafka Streams provides many different API methods which allow you to transform the incoming data so that other KStream types can be created dynamically. For instance, you apply a map or flatMap operation on the incoming KStream and thus create a new KStream from it. Not sure, if that is what you meant. If that is the case, then it really becomes a business logic concern. This is certainly possible.

Hope this helps, once again, these are my thoughts around these, and for some of these questions, there is no right or wrong answer. You need to consider the use case and design options carefully and choose the right path that fits your needs.

Design questions considering Kafka Streams and Spring Cloud Stream

1 Answers