1
votes

I have read from multiple sources on stack overflow which indicated using multiple consumer group will enable me to read from the same topic same partition from multiple consumers concurrently.

For example,

Can multiple Kafka consumers read from the same partition of same topic by default?

How Kafka broadcast to many Consumer Groups

Parallel Producing and Consuming in Kafka

So this is a follow up question to my previous question but on a slightly different context. Given the fact that we can only read and write to a partition leader, and Kafka logs are stored on hard disk. Each partition represents a log.

Now if I have 100 consumer groups reading from the same topic and same partition, that is basically reading from the same computer because we are only allowed to read from partition leader and cannot read from partition replicas, then how does Kafka even scale this kind of read load?

How does it achieve parallelism? Is it just spawning many threads and processes on the same machine to handler all the consumption concurrently? How can this approach scale horizontally?

Thank you

1
Kafka is not meant to be used that way, that is, production/consumption on a single topic single partition manner. It "scales" by partitioning, and if you don't tend to use it, it doesn't scale greatly.alirabiee

1 Answers

0
votes

If you have 100 consumers all reading from the same partition then the data for that partition will be cached in the Linux OS page cache (memory) and so 99 or perhaps even all 100 of the consumers will be fetching data from RAM instead of from a spinning hard disk. This is a unique feature of Kafka that despite its being written in a JVM language, it is designed to leverage off heap memory for extra performance in the case of parallel consumers of the same data.