1
votes

I'm testing adding Kafka partitions in a running system, but isn't clear to me how Kafka manage the existent data if you add partitions to an existing topic.

For example:

  1. I have a Kafka instance with a topic named test with 1 partition and 1 replica.
  2. The producer group starts to insert into that topic and the consumer group start to consume.
  3. I alter the topic to add another partition.

What happen with the topic data in this case? Is rebalanced between both partitions or only new produced data will use the new partition?

2

2 Answers

2
votes

Adding partitions doesn't change the partitioning of existing data for that matter it is based on philosophy of "append-only".For example if data is partitioned by hash(key) % number_of_partitions then this partitioning will potentially be shuffled by adding partitions , kafka will not attempt to rebalance/redistribute it.

1
votes

Adding a partition doesn't trigger any re-distribution of the data that already are in the current topic partitions. Only the new produced data will be sent to the new partition and you have to consider the following problem when you add a new partition ... If you are using the default partitioner and you are sending messages using a key, it works in the following manner : hash(key) % number_partitions. Kafka guaratees that messages with same key goes into the same partitions but it's not so true when you add a partition because in the previous formula, the number_partitions changes so a message with key = k1 that before adding a partition went to partition 0 for example, now could go to partition 1 (due to new partition).