Kafka Broker and Consumer optimization

Question

We have 25Million records written to the Kafka topic.

The topic is having 24 Partitions and 24 Consumers.
Each message is 1KB. And these messages are wrapped with Avro for serialization and Deserialization.
Replication Factor is 2.
The fetch size is 50000 and the Poll interval is 50ms.

Right now during the load test to consume and process 1Million, it takes 40mins on an average. But, we want to process the 25Million records in less than 20 to 30mins.

Broker configs:

background.threads = 10
num.network.threads = 7
num.io.threads = 8
Set replica.lag.time.max.ms  = 500
Set replica.lag.max.messages = 4
Set log.flush.interval.ms to default value as per logs
Used G1 collector instead of MarkSweepGC
Changed Xms to 4G and Xmx to 4G

our setup has 8 brokers each with 3 disks and with 10GBPS ethernet with simplex network.

Consumer Configs:

We are using Java Consumer API to consume the messages. We made the swappiness to 1 and using 200 threads to process the data within the consumer. Inside the consumer we are picking up the message and hitting Redis, MaprDB to perform some business logic. Once, the logic is completed we are committing the message using Kafka Commit Sync.

Each consumer is running with -xms 4G and -xmx 4G. What are the other aspects we need to consider in order to increase the read throughput?

Pixou Pixou · Accepted Answer · 2018-07-17T13:14:02

I won't provide you an exact answer to your problem, but more a roadmap and methodological help.

10 min for 1Million message is indeed slow IF everything works fine AND the consumer's task is light.

First thing you need to know is what is your bottle neck.

It could be:

the Kafka cluster itself: messages are long to be pulled out of the cluster. T test that, you should check with a simple consumer (the one provided with Kafka CLI for example), running directly on a machine where you have a broker (or close), to avoid network latency. How fast is that?
the network between the brokers and the consumer
the consumer: what does it do? Maybe the processing is really long. Then optimisation should run there. Can you monitor the ressources (CPU, RAM) required for your consumer? Maybe one good test you could do is create a test consumer, in which you load 10k messages in memory, then do your business logic and time it. How long does it last? This will tell you the max throughput of your consumer, irrespective of Kafka's speed.

Kafka Broker and Consumer optimization

1 Answers