0
votes

I am a newbie to Kafka technology. I have setup a basic single node cluster using Ambari.

I want to understand what is the recommended configuration for a production server. Let's say in production I will have 5 topics each getting traffic in the range of 500,000 to 50,000,000 in a day.

I am thinking of setting up a 3-4 node kafka cluster using EC2 r5.xlarge instances.

I am mostly confused about zookeeper part. I understand zookeeper needs odd number of nodes and zookeeper is installed on all kafka nodes, then how do I run Kafka with even number of nodes. If this is true it will limit Kafka to odd number of nodes as well.

Is it really needed to install Zookeeper on all Kafka nodes. Can I install Zookeeper on separate nodes and Kafka brokers on separate nodes, how ?

What if I want to run multiple Kafka clusters. Is it possible to manage multiple Kafka clusters through single Zookeeper cluster, how if possible ?

I have started learning Kafka recently only, any help would be appreciated.

Thanks,

2
"zookeeper is installed on all kafka nodes" -- No, don't do this.OneCricketeer
well, it's not such a big no. It is recommended to separate them but if it's just one kafka cluster and it's dedicated zookeeper, there should not be a problem. For example managed kafka cluster we use in production runs like this.sowieso-fruehling
@sowieso For any cluster sizes above 5 brokers, all brokers don't need it. But even then, Kafka and Zookeeper prefer to use lots of page cache / JVM memory, so it's best to separate anywayOneCricketeer
@cricket_007 I agree, moreover quorum negotiation is slower, so 5 servers for zookeeper ensemble should be maxsowieso-fruehling

2 Answers

2
votes

I am mostly confused about zookeeper part. I understand zookeeper needs odd number of nodes and zookeeper is installed on all kafka nodes, then how do I run Kafka with even number of nodes. If this is true it will limit Kafka to odd number of nodes as well.

Zookeeper can, but doesn't have to be installed on the same servers as kafka. It is not requirement to run zookeeper on odd number of nodes, just very good recommendation

Is it really needed to install Zookeeper on all Kafka nodes. Can I install Zookeeper on separate nodes and Kafka brokers on separate nodes, how ?

It is not required and it's even better not to have zookeeper and kafka on the same server. Installing zookeeper on another server is quite similar to when they reside on the same one. Every kafka broker needs to have zookeeper.connect setting pointing to all zookeeper nodes.

What if I want to run multiple Kafka clusters. Is it possible to manage multiple Kafka clusters through single Zookeeper cluster, how if possible ?

It is possible. In this case it's recommended to have servers dedicated just to zookeeper ensemble. In this case, in zookeeper.connect settings you should use hostname:port/path instead just hostname:port.

2
votes

Can I install Zookeeper on separate nodes and Kafka brokers on separate nodes, how ?

You can, and you should if you have the available resources.


Run zookeeper-server-start zookeeper.properties on an odd number of servers. (max 5 or 7 for larger Kafka clusters)

On every other machine that is a Kafka broker, not the same servers as Zookeeper, edit server.properties to point to that set of Zookeeper machine addresses for the zookeeeper.connect property.

Then do kafka-server-start server.properties for every new Kafka broker.

From there, you can scale Kafka independently of Zookeeper

Is it possible to manage multiple Kafka clusters through single Zookeeper cluster

Look up Zookeeper chroots

One Kafka cluster would be defined as

zoo1:2181/kafka1

And a second

zoo1:2181/kafka2

be careful not to mix those up if machines shouldn't be in the same Kafka cluster


You can find various CloudFormation, Terraform, or Ansible repos for setting up Kafka in a distibuted way in the Cloud on Github, or go for Kubernetes if you are familiar with it.