Kafka Streams Multiple Instance Design Implications

Question

Question: What are the implications of having multiple instances of a org.apache.kafka.streams.KafkaStreams class in a single JVM (EG: Memory, CPU Usage, concurrency concerns)?

Background: I am trying to provide a bulkheading mechanism so that if a stream operation throws an exception it does not transition the entire KafkaStreams instance into an ERROR state. I have divided up the application into different KafkaStreams instances that are each responsible for separate tasks (logging, external web calls, db calls, etc).

I have not been able to find documentation regarding how to (1) recover a KafkaStreams instance from an ERROR state or (2) design an application that provides some bulkheading principles using KafkaStreams or (3) Justify or refute my current approach.

If my approach violates a documented best practice from Confluent or Kafka then that would be helpful to know as well.

Application Versions: Kafka 1.0.0, Kafka-streams 1.0.0

Matthias J. Sax Matthias J. Sax · Accepted Answer · 2018-03-15T18:19:47

If an KafkaStreams instance ends up in ERROR state, you need close() it and create a new instance that you can start to replace the old one.

For running multiple KafkaStreams instances in a single JVM: this is basically ok. Note, if those instances belong to the same application, you need to configure them with different state directories to isolate both from each other. Otherwise, the are isolated from each other automatically.

To me, your design makes sense. Note though, that it's more resource intensive as KafkaConsumer and KafkaProducer cannot be share for you setup.

Kafka Streams Multiple Instance Design Implications

1 Answers