1
votes

I am going to setup a kafka cluster for our intensive messaging system. Currently we have setup two kafka clusters, one based in London (LD) as primary and antoher one based in New York (NY) as DR ( backup), and we have made java clients to replicate data from LD to NY.

As Kafka has built-in features such as partitioning, and replication for scalibity, high availability and failover purpose so that we want to create a single bigger cluster comprising of both servers in London and New York

But...

We are having the problem with connectivity between NY and LD servers, the network speed is really bad.

I have performed server tests.

producer config: - acks=1 ( requires acknowlegement from partition leader only) - sending Async.

  1. when producers in London sending messages to brokers in LD , the thoughput 100,000 msg /sec, providing message size is : 100bytes => 10MB/sec

  2. when producers in London and sending message to broker in NY, the thoughput 10 msg/sec, providing message size is : 100bytes => 1KB/sec

So...

I am considering any way to make sure the producer/consumer take the advantage of locality that means if they are in the same network will send messages to the neariest broker. Lets say: consumers in LD will send messages to LD based brokers. (I understand that the write/read request only happens on partition leader).

Any suggestion would be highly appriciate.

1

1 Answers

3
votes

From what I understood your current structure is:

  • 1 Broker located in NY.
  • 1 Broker located in LD.
  • n number of topics. (I am going to assume the number of topics is 1).
  • n number of partitions on the topic. (I am going to assume the number of partitions is 2).
  • Both of the partitions replicated over the brokers.

You want to make broker located in LD leader of all the partitions, so all the producers will interact with this broker and the broker located in NY will be used as replication. If this is the case, then, you can do the following:

Check the configuration of your topic:

./kafka-topics.sh --describe --topic stream-log

Topic:<topic-name>    PartitionCount:2    ReplicationFactor:2 Configs:
  Topic: stream-log   Partition: 0    Leader: 0   Replicas: 0,1 Isr: 0,1
  Topic: stream-log   Partition: 1    Leader: 1   Replicas: 1,0 Isr: 1,0

And assuming:

  • LD Broker ID: 0
  • NY Broker ID: 1

You can observe how the leader of the partition 1 is handled by the broker 1 (NY), we want to modify that, to do so is necessary to reassign the partitions:

./kafka-reassign-partitions.sh --reassignment-json-file manual_assign.json --execute

The contents of the JSON file:

{"partitions": [
  {"topic": "<topic-name>", "partition": 0, "replicas": [0,1]},
  {"topic": "<topic-name>", "partition": 1, "replicas": [0,1]}
 ],
 "version":1
}

Finally, to force kafka to update the leader, run:

./kafka-preferred-replica-election.sh

The last command will affect all the topics you have created if do not specify a list of topics, that should not be a problem but have it in mind.

Is worth to have a look to this guide, it explains something similar. And if you are curious you can check the official documentation of the tools here.