2
votes

Suppose a kafka topic with 3 partitions is getting consumed by a consumer group of 3 consumers. In a cloud environment if a new consumer scales up and now there are 4 consumers in that group. What happens in this situation?

  • Does Kafka create another partition so that the new consumer can access it

OR

  • Does the new consumer sit idle and does not consume anything ?
1

1 Answers

5
votes

Does Kafka create another partition so that the new consumer can access it?

No, Kafka won't create another partition for the new consumer. I recommend you read the kafka docs to understand the architecture of kafka.

In fact, Consumer is just client and kafka topic is located at kafka brokers which could be called server. So you should get that adding a consumer is just adding a client. For server, it is just add one more connection. Actually, the Kafka topic's partition setting is specified at creating topic, also you can change this configure after that. refer this http://kafka.apache.org/documentation/#operations to see how to create topic and modify .

Does the new consumer sit idle and does not consume anything?

YES. when a consumer add or remove from a consumer group, It will trigger a consumer rebalancing action.

The consumer rebalancing algorithms allows all the consumers in a group to come into consensus on which consumer is consuming which partitions. Consumer rebalancing is triggered on each addition or removal of both broker nodes and other consumers within the same group. For a given topic and a given consumer group, broker partitions are divided evenly among consumers within the group. A partition is always consumed by a single consumer. This design simplifies the implementation. Had we allowed a partition to be concurrently consumed by multiple consumers, there would be contention on the partition and some kind of locking would be required. If there are more consumers than partitions, some consumers won't get any data at all. During rebalancing, we try to assign partitions to consumers in such a way that reduces the number of broker nodes each consumer has to connect to.

pay attention to this : A partition is always consumed by a single consumer. && If there are more consumers than partitions, some consumers won't get any data at all. and the first word has a precondition : in a same consumer group. If two consumers belong to different group, It can consume same partition. You can refer this http://kafka.apache.org/documentation/#impl_brokerregistration to get more about the rebalancing algorithm.

The algorithm is very simple, It first compute a ratio = partitions counts / consumers counts. And then ditribute the partition to consumer orderly by the partition's brokerID. It is for reducing the connection number for every brokers.

So In your questions, It won't change at all. I think in kafka server code, there must has a judge:

if partition_count <= consumer_count
    just return, do not do the rebalancing.