0
votes

I have a 3 node Kafka cluster and I am creating a topic in one of the node with the below command: bin/kafka-create-topic.sh --zookeeper host1.com:2181,host2.com:2181,host3.com:2181 --replica 1 --partition 1 --topic test

So,now when I push messages to the topic,one of my host is getting overloaded with the topic messages as Kafka stores the messages in disk space. I want to know if there is any configuration to set to distribute the storing process across the cluster.

Thanks,

2

2 Answers

2
votes

As @om-nom-nom points out, you are creating a topic with a single partition. So that topic will only ever be on the node that you created it on. So even though you have a 3 node setup, the other two nodes will never be used.

Changing your topic to use multiple partitions is how you distribute a Kafka topic. The Kafka broker doesn't distribute messages to different nodes. It's the producers responsibility to determine which partition a message goes to. This is something you can you determine, or let the producer use a round-robin approach to distribute to partitions, as @om-nom-nom points out.

In Kafka producer, a partition key can be specified to indicate the destination partition of the message. By default, a hashing-based partitioner is used to determine the partition id given the key, and people can use customized partitioners also.

To reduce # of open sockets, in 0.8.0 (https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning key is not specified or null, a producer will pick a random partition and stick to it for some time (default is 10 mins) before switching to another one.
source

1
votes

Topic can be sliced into multiple partitions (your config uses just 1), which by default will be distributed between brokers in round-robin fashion.