1
votes

We are planning to expand cluster from 2 node to 8 node. The partition reassignment tool has the option to move topic or partition.

For re-distribution of partitions I am planning to follow the below steps.

Irrespective of number of node additions,If I give all the topics in the topic-to-move.json and all the brokers in the below command then it will give equal distribution of partition among nodes correct ?

bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file topics-to-move.json --broker-list "0,1,2,3,4,5,6,7" --generate

After this I am planning to apply the json

--execute --reassignment-json-file generated-json file

Will this cause any problem ?

This step seems to be more general but why it is not documented this way?

2

2 Answers

1
votes

There are few things to be aware of:

  1. Evenly distributing partitions does not necessarily evenly distribute data. Some partitions hold more data than others so you need to look at how much data is in each partition to make a plan to spread the data evenly across the brokers. This is particularly true is you have single partition topics or unevenly balanced keys.
  2. Be "rack aware". If the 8 brokers are in 3 Amazon availability zones or on two different power supplies or network switches in your data center then be careful not to distribute the leader and all it's replicas into the same Rack ID or you lose your high availability.
  3. Consider using replication quotas. When you move lots of data between brokers it can take away network bandwidth from active producers and consumers. Kafka 0.10+ added separate replication quotas (bandwidth throttling) so that you could reduce the bandwidth used during reassignment so it will not negatively impact you live client traffic. Just do t throttle too low or you reassignment might not ever catch up to the new changes coming from producers.
  4. You may want to consider using a third party tool to help to automatically build a reassignment plan. Yahoo!'s Kafka Manager has a reassignment feature (see https://github.com/yahoo/kafka-manager/blob/master/README.md) and Confluent has a 30 day free trial for their Auto Rebalancer that allows both expansion and reduction of broker nodes with rack awareness and throttled reassignment (see http://docs.confluent.io/current/kafka/rebalancer/rebalancer.html)
1
votes

By passing the full topic list to the tool, all your partitions are likely to be reassigned.

In an already large cluster (> 1000s topics) this would cause a lot of unnecessary data copy and leader elections. So typically you would only provide a subset of your topics and only specify the new brokers as destinations to minimize the work required to complete the reassignment.

If your cluster is small enough and without GBs/TBs of data, passing all topics to the reassignment tool should be fine and it's probably the easiest/fastest.