0
votes

I found this article while I was working on calculating my producer throughput. In it Jay Kreps says that for a Single producer thread and 3x synchronous partition replication he gets 421,823 records/sec. His records are 100 bytes each, he has 6 partitions and 6 brokers. He is also using a callback based send so that means that he can guarantee the ordering of messages.

I am using a Kafka as a service running a single broker, 6 partitions, 1x replication. I send roughly the same sized records and I get 23 records/sec. Unlike Jay, I'm using schema registry for avro serialization. I have tried all types of sending the Kafka Producer API provides:

  • calling .get on the future
  • sending messages with a callback
  • sending messages without a callback

I am not even remotely close to the number given above. I want to guarantee the order of messages so I would like to have at least a callback passed along with the record.

I am aware that chasing his benchmark will be difficult and that's not my goal. I just feel like there's something fundamental that I am missing. Can I ask for some suggestions? I will provide as much additional context as is necessary.

1
If you use the kafka-producer-perf-test script, what do you get?OneCricketeer
@cricket_007 This helped me verify that the Kafka I'm using is able to push around 900 records/sec of 100 byteszaxme
Given the article you found was many, many Kafka releases ago, that seems very low.OneCricketeer

1 Answers

0
votes

So after some research it turns out that I had a blocking call to the schema registry I was doing for each batch that was sent out to Kafka. Once that was dealt with the throughput shot up to 8500 records/sec.

If you are using a schema registry and you don't want to make constant calls to it

  • make sure that you have auto.register.schemas set to false
  • if you are using GenericRecord, make sure that your Schema objects are generated referentially the same