I have setup a sample Kafka cluster on AWS and am trying to identify maximum throughput possible with the given configurations. I am currently following post provided here for this analysis.
I would appreciate it if you could clarify the following issues.
I observed a throughput of 40MB/s for messages of size 512 bytes ( single producer - single consumer ) with given hardware. Assume I need to achieve a throughput of 80MB/s.
As I understand one way to do this to increase the number of partitions per topic and increase the number of threads in producer and consumer. ( Assuming I do not change the default values for batch size, compression ratio etc. )
- How to find the maximum throughput possible with given hardware? The point after which we are required to improve our hardware resources if we are to further improve the throughput?
( In other words how to make the decision "With X GB RAM and Y GB disk space this is the maximum throughput I can achieve. If I need to further improve the throughput I have to upgrade RAM to XX GB and disk space to YY GB" )
2.Should we scale the cluster vertically or horizontally? What is the recommended approach?
Thank you.