I know that each partition is allocated to one Kafka consumer (inside of a consumer-group), but one Kafka consumer can be consuming multiple partitions at the same time. If each has an open connection to the partition, then I can imagine tens of thousands of connections open per consumer. If this is true, that seems like something to watch out for when deciding on number of partitions, no?
1
votes
1 Answers
2
votes
I'm assuming you are asking about the official Java client. Third party clients could do something else.
The KafkaConsumer does not have a network connection per partition. As you hinted, that would not scale very well.
Instead the KafkaConsumer has a connection to each broker/node that is the leader of a partition it is consuming from. Data for partitions that have the same leader is transmitted using the same connection. It also uses an additional connection to the Coordinator for its group. So at worst it can have <# of brokers in the cluster> + 1 connections to the Kafka cluster.
Have a look at NetworkClient.java, you'll see that connections are handle per Node (broker)