0
votes

I have read from here and a bit not sure about the partition log.

First they say:

For each topic, the Kafka cluster maintains a partitioned log that looks like this:

Then they show a picture:

enter image description here

Also they say

The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism—more on that in a bit.

Do I understand correctly that :

  1. On a cluster, it can have only one partition log of a topic? In other words, two partition of the same topic cannot be in the same cluster?
  2. A Cluster can have multiple partition log from different topics?
  3. The picture about a topic should be more like this? enter image description here
2
1. your understanding is incorrect. A topic can have many partitions that are spread out on different machines (broker nodes) in the cluster – Hans Jespersen

2 Answers

1
votes

A topic consist of 1 or many partitions. You specify the number of partitions when creating the topic, and partitions can also be added after creation.

Kafka will spread the partitions on as many brokers as it can in the cluster. If you only have a single broker then they will be all on this broker.

Many partitions from the same topic can live on the same broker. This happens all the time as most clusters only have a dozen brokers and it's not uncommon to have 50 partitions, hence several partitions from the same topic will live on the same broker.

What the docs say is that a partition is a unit that cannot be split. It's either on a broker or not. Whereas a topic is just a collections of partitions that have the same name and configuration.

1
votes

To answer your question:

  1. For a Kafka cluster of b brokers and a topic with p partitions, each broker will roughly hold p/b partitions as primary copy. They might also hold the replica partitions, but that depends on your replication factor. So, e.g. if you have a 3-node cluster, and a topic test with 6 partitions, each node will have 2 partitions.

  2. Yes, it surely can. Extending the previous point, if you have two topics test1, and test2, each with 6 partitions, then each broker will hold 4 partitions in total (2 for each topics).

  3. I guess in the diagram you have mislabeled brokers as cluster.