29
votes

The Kafka controller in a Kafka cluster is in charge of managing partition leaders and replication.

If there are 100 brokers in a Kafka cluster, is the controller just one Kafka broker? So out of the 100 brokers, is the controller the leader?

How would you know which broker is the controller?

Is the management of the Kafka Controller critical to Kafka system management?

3

3 Answers

27
votes

Within a Kafka cluster, a single broker serves as the active controller which is responsible for state management of partitions and replicas. So in your case, if you have a cluster with 100 brokers, one of them will act as the controller.

More details regarding the responsibilities of a cluster controller can be found here.

In order to find which broker is the controller of a cluster you first need to connect to Zookeeper through ZK CLI:

./bin/zkCli.sh -server localhost:2181 

and then get the controller

[zk: localhost:2181(CONNECTED) 0] get /controller

The output should look like the one below:

{"version":1,"brokerid":100,"timestamp":"1506423376977"}
cZxid = 0x191
ctime = Tue Sep 26 12:56:16 CEST 2017
mZxid = 0x191
mtime = Tue Sep 26 12:56:16 CEST 2017
pZxid = 0x191
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x15ebdd241840002
dataLength = 56
numChildren = 0

Zookeeper is the storage of the state of a Kafka cluster. It is used for the controller election either in the very beginning or when the current controller crashes. The controller is also responsible for telling other replicas to become partition leaders when the partition leader broker of a topic fails/crashes.

24
votes

The controller is one of the Kafka brokers that is also responsible for the task of electing partition leaders (in addition to the usual broker functionality).

Is the controller just one broker?

There is only 1 controller at a time.

Going internally, each broker tries to create an ephemeral node in the zookeeper (/controller). The first one succeeds, becoming the controller. The others just get a proper exception ("node already exists"), and watch on the controller node. When the controller dies, the ephemeral node is removed, and the watching brokers are notified. And again, the first one among them which succeeds in registering the ephemeral node, becomes the new controller, the others will once again get the "node already exists" exception and keep on waiting.

How would you know who is the controller in Kafka?

When a new controller is elected, it gets a "controller epoch" number by zookeeper. The brokers know the current controller epoch and if they receive a message from a controller with an older number, they know to ignore it.

Is the controller the leader?

Not really.. Each partition has its own leader. When a broker dies, the controller goes over all the partitions that need a new leader, determines who the new leader should be (simply a random replica in the in-sync replica list aka ISRs of that partition) and sends a request to all the brokers that contain either the new leaders or the existing followers for those partitions.

The new leaders now know that they need to start serving producer and consumer requests from clients, while the followers now know that they need to start replicating from the new leader.

7
votes

The Kafka controller is brain of the Kafka cluster. It monitors the liveliness of the brokers and acts on broker failures.

There will be only one Kafka controller in the cluster. The controller is one of the Kafka brokers in the cluster, in addition to usual broker functionality, is also responsible for electing partition leaders whenever existing brokers leaves the cluster or when a broker joins the cluster.

The first broker that starts in the cluster will become the Kafka Controller by creating an ephemeral node called "/controller" in Zookeeper. When other brokers starts they also try to create this node in Zookeeper, but will receive an "node already exists" exception, by which they understands that there is already a Controller elected in the cluster.

When the Zookeeper doesn't receive heartbeat messages from the Controller, the ephemeral node in Zookeeper will get deleted. It then notifies all the other brokers in the cluster that the Controller is gone via Zookeeper watcher, which starts a new election for new Controller again. All the other brokers will again try to create a ephemeral node "/controller" and the first one to succeed will be elected as the new Controller.

There can be possibility of having more than one Controller in a cluster. Consider a case where a long GC (garbage collection) happened on the current Kafka Controller ("Controller_1") due to which Zookeeper didn't receive the heartbeat message from the Controller within the configured amount of time. This causes the "/controller" node being deleted from Zookeeper and another broker from the cluster gets elected as the new Controller ("Controller_2").

In this situation, we have 2 Controllers "Controller_1" and "Controller_2" in the cluster. "Controller_1" GC is finished and it may attempt to write/update the state in Zookeeper. "Controller_2" will also attempt to write/update the state in Zookeeper, which can lead to Kafka cluster being inconsistent with writes from both old Controller and new Controller.

In order to avoid it, a new "epoch" is generated every time a Controller election takes place. Each time a controller is elected, it receives a new higher epoch through Zookeeper conditional increment operation.

With this, When an old Controller ("Controller_1") attempts to update something, Zookeeper compares the current epoch with the older epoch sent by the old Controller in its write/update request and it simply ignores it. All the other brokers in the cluster also knows the current controller epoch and if they receive a message from old controller with an older epoch, they will ignore it as well.