6
votes

Please bear with me. I'm pretty new to kafka. I'm working on the project where producers can come up at runtime(not a fixed number) and publish messages. Currently they publish to unique topic (topic.uuid) created at runtime in kafka broker, I have one consumer on the other end which subscribes to topic.* pattern and subscribes to all the topics and does re-balancing as new topics come in.Is it the correct approach?

Now, I'm confused should we have one topic with multiple partitions or multiple topics with one partition each.Technically, it is same.

But, what is the complexity involved in getting new partition (at runtime) and new consumer for every partition (at runtime) to achieve higher throughput as it is mentioned in various blogs that number of partition should have same number of consumer's in a group.

1

1 Answers

16
votes

Topics should be looked at from a functional point of view. You can have multiple topics, each one for a specific family of messages.

For example you can have a topic that parses important messages and another topic for bulk loads/backups with log compaction, to get a finer-grained per-record retention policy.

Partitions are interesting from an technical/architectural point of view. Kafka is a distributed system. A topic can have several partitions. When you have multiple Kafka brokers, each broker will get a set of partitions assigned that they are responsible for.

For example if you have a topic with 24 partitions and you spawn 3 Kafka brokers, each one will be responsible for 8 of the partitions. Kafka and Zookeeper will take take care of the load distributions of these partition and redistribute the partitions correctly in case a broker goes down.

Consumers can read from these partitions and will read from the leader broker. If you have multiple consumers in a consumer group, these will distribute the partitions they read from.

For example if you have more consumers in a consumer group than partitions in the topic, some consumers will never get any messages.