I am using HornetQ embedded in JBoss 6.1 application server. My applications (a client app, producing messages, and JBoss app consuming them) cannot handle more than 4000 TPS on a server while the CPU is still 60% idle. I tried to remove persistence to check if I was disk-bound but it does not improve the throughput.
It seems the problem is on the producer side. At least while monitoring the queue size, it stays very small, meaning consumers are not the bottleneck.
Should I use several queues to be more efficient? I already read performance tuning documentation from HornetQ, but could not find the reason for this. Or may be it is because I am using AUTO_ACKNOWLEDGE mode? I am running several threads for the producers to this should not impact a lot. The producer JVM cannot use more than 1 CPU thread anyway. I even tried to run several instances of my producer application, but it does not go faster. The network bandwidth is high (1 Gbps) and my messages are very small (< 1 KB). Also, the producer and consumer applications are running on the same server. HornetQ is configured in a JBoss cluster of 2 servers.