Using Kafka Streams just as a state store in a Kafka Consumer App

Question

I am currently working on a Spring boot application using Spring Kafka Consumer API.

Each message I get on to a topic needs to be converted into a new object type with additional properties coming in from other topics. Currently, these other topics are not yet developed and we are using a mocked version of in-memory data for processing the requests.

For example, a new "shopping order" message arrives, but I am using mocked "Customer" object and mocked "item" object in order to process the order. The plan is to move to use real Customer topic and real Item topic.

Also, currently, the application is only Spring Kafka listeners for getting new order. The listeners invoke a spring bean method which processes the order and creates a new object to be written to another output topic named customer-order by using the same mocks I mentioned above.

We are currently thinking about evolving the architecture for this application. I have been reading up on Kafka streams. The documentation I have read online for streams only take simple examples such as word count, join etc. With my limited knowledge of streams, I don't envision using functionality such as calculating total etc.

I have thought of some options for the architecture...

I was planning to retain the consumer API i.e. use Spring listeners implementation for receiving new order messages while using streams dependency just to create state stores that will eventually replace the mocked data. The idea is that the mocked data will eventually come from other topics. So in this approach, the "streams" part of Kafka will be used only for creating state store and not for processing incoming records.
Use purely Kafka consumers API and use API calls to fetch data external to my topic. This is a less preferred option as I don't want to make an external API call for each new order.
Use Kafka Streams for both reading new incoming orders and also for gathering and storing state. Also, make use of joins and merges so as to process data.

What do you suggest? 1,2, or 3? Is that a good idea to use Streams for this kind of solution? Is there any benefit in moving this implementation to use Kafka streaming? Or am I better of staying with 2.?

Cross posted as Jira ticket: issues.apache.org/jira/browse/KAFKA-7971 — Matthias J. Sax

xmar xmar · Accepted Answer · 2019-02-21T16:17:20

Number 1 sounds strange to me. You can keep a KafkaStreams application exposing state stores via Interactive Queries, but that would look more like a flavour of 2. You'll have to take into account also how you deploy your instances and ensure co-partitioning between the Spring part and the KafkaStreams part.

I don't see any problem in doing it in Kafka Streams fully, unless you have some very complex logic you cannot implement with current API, which I'd be surprised to learn you couldn't. Actually what you described sounds like a usual application for it (with the caveat of not knowing other requirements like time, expected volumes, etc).

Benefits:

It creates an abstraction layer over consumption and production. For example, something like the Order-Consumer enrichment sounds like a good use of it, by using the join as you mentioned.
Takes away complexity on deploying applications - it uses a partition assignment and rebalance scheme same as Kafka Brokers. You can add/remove processing instances seamlessly.
It is simpler than other stream processor libraries, but in most cases it's enough (and you also have Processor API - apart from DSL - if you need more DIY stuff.
Speed of development. Once you have a basic knowledge of it (which is not that hard) you can begin writing applications quite quick because you focus on the logic.
Documentation is quite taken care of.

Cons:

It is a JVM library, but it seems you're already using Java.
Having to learn a new paradigm - though it is actually quite simple. And quite similar and definitely simpler from other stream processing libraries.
It is tied to (actually a part of) Kafka. If you're moving your infra away, you'll probably have to use a different stream processor.
Depending on your use case and especially, on its complexity, you may find other streaming platforms more beneficial (e.g. Spark or Flink just to name two).
It is quite mature, but probably less that e.g. Spark. It is getting better, and you have the Confluent guys working on it, though.

This is not a comprehensive list, but it's the most important points from the top of my head.

Using Kafka Streams just as a state store in a Kafka Consumer App

1 Answers