1
votes

I have some questions regarding Kafka streams and how they work, I am experienced with general Kafka consumers and producers paradigm, however, this is my first time I am trying to use Kafka streams.

Questions:

  1. In general Kafka consumer model we subscribe to a topic and start consuming from a partition, for simplicity sake lets say we have 1 partition and 1 consumer, now if we want to increase our processing we increase the number of partitions and additionally add more consumers, how does this work in Kafka Streams, if we increase partitions how should we scale up the app, do we need to add more machines or do we need to do something else?
  2. When I am consuming data via Kafka consumers I may end up doing something with the messages, say for example, I may query an API download the file and write to an NFS and forward the message or write the incoming message value to a database and then forward the notification into another Kafka topic, how is the same use case solved, where we are not following the paradigm of KAFKA -> KAFKA but instead have KAFKA -> PROCESS(STORE IN DB) -> KAFKA, can Kafka Streams even solve this use case?
  3. Lastly, how are exceptions handled and how are offsets managed. In an ever running production systems where there is an endless stream of messages that are coming, in case of any exceptions, say because of any network outage, we shutdown the consumers and do clean bring up. How to achieve the same with Kafka Stream processing app?
1

1 Answers

2
votes
  1. The Consumer API is still working behind the scenes in the exact same way. To answer the question - you start more running instances of the application; these don't necessarily have to be on completely different servers

  2. It's not really recommended to use Kafka Streams to do remote work that's not confined to Kafka-Kafka interaction. At least not without accepting this introduces latency, and therefore shouldn't be done when doing topic joins that are dependent upon time windows, for example. Kafka Connect can be your system to take data from a topic to a database

  3. Again, Kafka Streams is just a layer over the Producer/Consumer APIs. You'll still get the same network exceptions, or if you read a corrupt record, there are options for handling poison pill records