13
votes

In the Kafka Streams Developer Guide it says:

Kafka Streams applications can only communicate with a single Kafka cluster specified by this config value. Future versions of Kafka Streams will support connecting to different Kafka clusters for reading input streams and writing output streams.

Does this mean that my whole application can only connect to a single Kafka Cluster or each instance of KafkaStreams can only connect to a single cluster?

Could I create multiple KafkaStreams instances with different properties that connect to different clusters?

2

2 Answers

11
votes

It means that a single application can only connect to one cluster.

  • You cannot read a topic from cluster A and write the result of your computation to cluster B.
  • It's not possible to read two topics from two different clusters with the same instance.

Could I create multiple KafkaStreams instances with different properties that connect to different clusters?

Yes, absolutely. But those different instances will be different applications. (Think "consumer groups".)

Update:

Within a single JVM, you can create as many KafkaStreams instances as you like. You can also configure them to connect to different clusters (and you can use the same KStreamBuilder for all of them if you want to do the same processing).

6
votes

Just to add to the excellent answer from @Matthias J. Sax.

Does this mean that my whole application can only connect to a single Kafka Cluster or each instance of KafkaStreams can only connect to a single cluster?

I think there are two questions here.

It depends on the definition of "my whole application", i.e. it could simply be a single KafkaStreams instance or multiple instances on a single JVM or perhaps multiple KafkaStreams instances on a single JVM in a Docker container that is executed as a pod. Whatever it is, you can find "my whole application" a bit too broad and not very precise.

The point is that there is no way you can create a KafkaStreams instance that could talk to multiple Kafka clusters (since the configuration is through properties that are key-value pairs in a map) and so just by this you could answer your own question, couldn't you?


Being unable to use two or more Kafka clusters in a Kafka Streams application is one of the differences between Kafka Streams and Spark Structured Streaming (with the latter being able to use as many Kafka clusters as you want and so you could build pipelines between different Kafka clusters).