3
votes

Just wanna understand the basics properly.

Let's say I've a topic called "myTopic" that has 3 partitions P0, P1 & P2. Each of these partitions will have a leader and the data (messages) for this topic is distributed across these partitions.

1. Producer will always writes to the leader of the partition in a round robin fashion based on the load on the broker. Is that right?

2. How do the producer know the leader of the partition?

3. Consumer reading a particular topic should read all partitions of that topic? Is that correct?

Appreciate your help.

2
In SO, there is a very special way to practically say "Appreciate your help" - accepting and/or upvoting helpful answers (which take up valuable time for respondents...)desertnaut

2 Answers

4
votes
  1. Producer will always writes to the leader of the partition in a round robin fashion based on the load on the broker. Is that right?

By default, yes.

That said, a producer can also decide to use a custom partitioning scheme, i.e. a different strategy to which partitions data is being written to.

  1. How do the producer know the leader of the partition?

Through the Kafka protocol.

  1. Consumer reading a particular topic should read all partitions of that topic? Is that correct?

By default, yes.

That said, you can also implement e.g. consumer applications that implement custom logic, e.g. a "sampling" consumer that only reads from 1 out of N partitions.

3
votes

Producer will always writes to the leader of the partition

Yes, always.

in a round robin fashion based on the load on the broker

No. If a partition is explicitly set on a ProducerRecord then that partition is used. Otherwise, if a custom partitioner implementation is provided, that determines the partition. Otherwise, if the msg key is not null, the hash of the key will be used to consistently send msgs with the same key to the same partition. If the msg key is null, only then the msg will indeed be sent to any partition in a round-robin fashion. However, this is irrespective of the load on the broker.

  1. How do the producer know the leader of the partition?

By periodically asking the broker for metadata.

  1. Consumer reading a particular topic should read all partitions of that topic? Is that correct?

Consumers form consumer groups. If there are multiple consumer instances in a consumer group, each consumes a subset of the partitions. But the consumer group as a whole consumes from all partitions. That is, unless you decide to go "low-level" and manage that yourself, which you can do.