0
votes

I'm a bit confused on the Topic partitioning in Apache Kafka. So I'm charting down a simple use case and I would like to know what happens in different scenarios. So here it is:

I have a Topic T that has 4 partitions TP1, TP2, TP4 and TP4.

Assume that I have 8 messages M1 to M8. Now when my producer sends these messages to the topic T, how will they be received by the Kafka broker under the following scenarios:

Scenario 1: There is only one kafka broker instance that has Topic T with the afore mentioned partitions.

Scenario 2: There are two kafka broker instances with each node having same Topic T with the afore mentioned partitions.

Now assuming that kafka broker instance 1 goes down, how will the consumers react? I'm assuming that my consumer was reading from broker instance 1.

1

1 Answers

6
votes

I'll answer your questions by walking you through partition replication, because you need to learn about replication to understand the answer.

A single broker is considered the "leader" for a given partition. All produces and consumes occur with the leader. Replicas of the partition are replicated to a configurable amount of other brokers. The leader handles replicating a produce to the other replicas. Other replicas that are caught up to the leader are called "in-sync replicas." You can configure what "caught up" means.

A message is only made available to consumers when it has been committed to all in-sync replicas.

If the leader for a given partition fails, the Kafka coordinator will elect a new leader from the list of in-sync replicas and consumers will begin consuming from this new leader. Consumers will have a few milliseconds of added latency while the new leader is elected. A new coordinator will also be elected automatically if the coordinator fails (this adds more latency, too).

If the topic is configured with no replicas, then when the leader of a given partition fails, consumers can't consume from that partition until the broker that was the leader is brought back online. Or, if it is never brought back online, the data previously produced to that partition will be lost forever.

To answer your question directly:

  • Scenario 1: if replication is configured for the topic, and there exists an in-sync replica for each partition, a new leader will be elected, and consumers will only experience a few milliseconds of latency because of the failure.
  • Scenario 2: now that you understand replication, I believe you'll see that this scenario is Scenario 1 with a replication factor of 2.

You may also be interested to learn about acks in the producer.

In the producer, you can configure acks such that the produce is acknowledged when:

  • the message is put on the producer's socket buffer (acks=0)
  • the message is written to the log of the lead broker (acks=1)
  • the message is written to the log of the lead broker, and replicated to all other in-sync replicas (acks=all)

Further, you can configure the minimum number of in-sync replicas required to commit a produce. Then, in the event when not enough in-sync replicas exist given this configuration, the produce will fail. You can build your producer to handle this failure in different ways: buffer, retry, do nothing, block, etc.