How to spread existing kafka topic partitions into more directories?

Question

By default, kafka uses one directory to keep the log. To increase performance, it is advised to mount more disks to the broker, and assign each disk to one directory then in server.properties enter the log.dirs= as a coma separated list of directories. The documentation says, that partitions will be distributed among the directories round-robin style. As I understand now, this is true for new topics.

I would like to distribute half of the partitions of my already created topic to a newly created log.dir while keeping the other half where they are - is there a supported way to do that ?

Shawn Guo Shawn Guo · Accepted Answer · 2016-10-31T09:27:03

https://community.hortonworks.com/articles/59715/migrating-kafka-partitions-data-to-new-data-folder.html

Approach 1: Just delete existing data directory contents and configure new data directories locations

In this approach, Kafka replicates the partition data from other members of the cluster. Complete partition data will replicated from the beginning. All the partitions are evenly allocated across directory locations. Replication time will depend on data size. If we have huge data, replica may take more time to join the ISR. This will also put lot of load on the network/cluster. This may cause some problems to Kafka cluster. We may see, some ISR changes and client errors. This approach should be fine for small clusters ( GBs of data)

Note: In Kafka, broker-id will be stored in log.dir/meta.properties file. If we have not configured broker.id, then by-default Kafka generates a new broker-id. To avoid this, retain existing meta.properties file in log.dirs directory.

Approach 2: Move partition directories to new data directory (Without coping checkpoint files )

It is similar to above approach, but here Kafka only replicates the moved partitions.

Approach 3: Move partition directories and split checkpoint files.

Each data directory contains three checkpoint files namely replication-offset-checkpoint, recovery-point-offset-checkpoint and cleaner-offset-checkpoint. These files contains last committed offset, log end checkpoint and cleaner checkpoint details for the partitions available in that directory. Each of the file contains version number, no.of entires, one row for each entry.

We need to copy/create these files to new directory and we need to update these files. we need to adjust the entries in both the directories (old directory and new directory). This may be tedious if we have large number of partitions. But this is the best approach if we have huge data. With this approach replicas will join quickly to ISR. Load on the cluster/network will be less.

How to spread existing kafka topic partitions into more directories?

1 Answers