here are my suggestions/opinions on the questions, you might want to do further research into some of the core Kafka Streams related questions.
- Not entirely clear what use-case/design you are proposing. The way I understood it, you have an external system (such as a database) and you want to extract that data as a key/value pair which could be translated into a
KTable
. In Kafka Streams, as you indicated in your question #3, the source of truth is the Kafka topic. Therefore, you need to bring the data from the external system into a Kafka topic first, and then materialize that as a KTable
in Kafka Streams. There are established patterns such as the Change Data Capture (CDC) for exporting data from external systems to a Kafka topic in almost real-time. KTable
can be materialized into state storage which is by default backed up RocksDB. The same information is also replicated by Kafka changelog topics and therefore applies the guarantees provided by data in a Kafka topic. I hope that someone from the Kafka Streams team can chime in on this specific topic for more information needed.
- Spring Cloud Stream provides a binder for Kafka Streams using which you can establish bindings to Kafka topics through various Kafka Streams types such as
KStream
, KTable
and GlobalKTable
. See the reference docs for more details. The binder provides several convenient options for data types with Serde inference in the case of common data types. The question about Avro data types is really dependent on your use cases and how you want to manage the schema structure for the data. If centralized schema management is a concern, then avro is a good choice. You can use Confluent's schema registry for Avro with Spring Cloud Stream. Spring provides a schema registry, but for Kafka Streams workloads that require avro, we recommend using the Confluent schema registry as it has more features. Either way, it should work and we provide a number of sample applications demonstrating schema evolution here.
- As I mentioned in the answer for #1, yes, the source of truth is Kafka topics and the Spring Cloud Stream binder provides binding mechanisms for connecting to Kafka topics and translate the data as
KStream
or KTable
.
- Here again, I am not following the actual use-case. However, Kafka Streams provides many different API methods which allow you to transform the incoming data so that other
KStream
types can be created dynamically. For instance, you apply a map
or flatMap
operation on the incoming KStream
and thus create a new KStream
from it. Not sure, if that is what you meant. If that is the case, then it really becomes a business logic concern. This is certainly possible.
Hope this helps, once again, these are my thoughts around these, and for some of these questions, there is no right or wrong answer. You need to consider the use case and design options carefully and choose the right path that fits your needs.