are data split across partitions?

Question

I read a kafka documentation, but I still confused, when someone talk about data and partitions. In documentation I see that client will send message to partition. Then partition replicate message to replicas (across brokers). And consumer read data from partition.

I have an topic which have 2 partitions. Let's say I have one producer, which send message to partition#1. But I have 2 consumers, one read from partition#1, and second from partition#2. Is it mean that my partition#1 will have 50% messages, and partition#2 will have 50%. Or when client send data to partition#1 then partition#1 should be replicate data not only across brokers, but and for across partitions?

ppatierno ppatierno · Accepted Answer · 2019-12-15T11:38:29

About your specific example ...

If your producer sends messages without a key on the message, the default partitioner (in the producer itself) will apply a round robin algorithm to send messages to partitions so: message 1 to partition 1, messages 2 to partition 2, message 3 to partition 1 and so on. It means that you are right, partition 1 will get 50% of messages. So one consumer reading from partition 1 will get 50% of sent messages; the other 50% will be got by the other consumer reading from partition 2. This is how Kafka works for having higher throughtput and handling more consumers. It's important to add that when a partition has more replicas, one of them is defained "leader" and the other ones are "followers". The messages exchange happens always through the "leader". The "followers" are just copies. They are used in case the broker hosting the "leader" partition goes down and another broker which hosts a "follower" partition is elected as "leader".

I hope this helps.

are data split across partitions?

1 Answers