0
votes

We are using Apache Kafka 2.2 version with 5 brokers. We are getting 50 millions of events per a day but we are hitting high kafka cpu usage. We are using default producer/consumer/broker settings.

I have some questions about the performance;

We have different kafka-streams applications which do aggregation, or join operations to carry enriched messages. All of our kafka-streams applications consist of these settings:

  • exactly-once:true
  • min-in-sync replicas:3
  • replication factor: 3
  • topic partition: ~50-~100 (Differs on topology)

and of course there may be internal topics for topologies. We are scaling up our worker machines into at least 5 applications. So each instance looks for one thread for one partition strategy most of the time.

However, beside the topology optimization, is there something we can do on the default settings?

Beside the kafka-streams application we are using spring-kafka producer/consumer. But like I said we are using default settings. For example we are producing events one by one on the producer side.

Our throughput is not fast enough and we have high cpu usage. If we close the some of our kafka-streams application, brokers load are decreasing. So my question is;

Does the in-sync-replicas with exatly-once true and replication factor of 3 applies too much load on the brokers ? I do not want to lose or deduplicate my messages on prod environment so my streams applications must consist of exaclty-once true, but the spring applications do well without kafka-streams.

I want to decrease load the overall cpu usage of the broker in our system.

If I use the batch producer on the producer side, decrease min-in-sync replicas for my fault tolerable workers, does my cpu usage decrease ?

I can't think the optional way. Any idea would help me why my brokers CPU usage is very very high like around %80-90 day time.

What can cause the high cpu usage on the brokers ?

1
What is the instance type of your broker? What is the number of core and memory of the Broker?Ajay Kr Choudhary

1 Answers

0
votes

You need to give a little bit more detailed information about your topology/cluster. For example;

  • do you see CPU spike in all of your brokers or a select few? This might lead you to main problem more easily.
  • is encryption enabled? Main culprit of high CPU usage is generally encryption. Maybe some of your applications using encrypted channels and some don't?.
  • Examine your streams applications topologies. Wrong key usage may result in excess repartitioning which uses repartition topics in Kafka and may lead to high CPU usage both in broker side and your application side.
  • Does your consumers frequently closed/restarted or somehow can't do their jobs within max.poll.interval.ms? if so they will be rebalanced frequently and brokers do have a job in rebalance operation albeit a small one which can add up significantly if you have a lot of groups that are rebalancing all the time.
  • Unless you are using synchronous send's with producers(which impacts throughput significantly) they are batched and sent together. but you can try playing with linger.ms configuration. This can affect your producer throughput.

I don't think replication.factor and min.insync.replicas configurations play a significant role but can't say anything for certain without knowing all the variables.

Also if you have monitoring tools installed, you should check them to see if anything is unusual.