0
votes

I am trying to identify maximum throughput I can achieve in a sample Kafka cluster setup on AWS. I have configured 2 Kafka brokers on two EC2 instances, and am trying to use ProducerPerformance Tool to monitor the throughput as shown below.

./bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --throughput=10000--topic=TOPIC--num-records=50000000 --record-size=200 --producer-props bootstrap.servers=SERVERS buffer.memory=67108864 batch.size=64000

I would appreciate it if you could help me clarify following questions.

  1. What does throughput parameter mean?

On documentation I found this information -- throttle maximum message throughput to approximately THROUGHPUT messages/sec

but I also noticed that -1 is passed as a parameter to this --throughput.( For example here - https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines )

  1. When we run the script we get some example output like 821,557 records/sec (78.3 MB/sec). How does this value relate to the throughput parameter we set earlier?

Thank you.

1

1 Answers

0
votes
  1. If throughput is set to -1, Kafka doesn't do any throttling things for the perf tool. Instead, if it's set, Kafka tries its best to have the TPS close to this target as possible as it can. Say throughput is set to 1000, then the perf tool approximately sends 1000 records per second.

  2. In a test where throughput is -1, the test shows one single producer instance could be able to send 821,557 records per second. Based on the bandwidth resource and average record size, you could probably figure out the bottleneck.