I found a problem in my prod env.
We have 6 queues in a mq cluster, and we have thread pool with 200 threads(actually it will be more, since it will schedule some special task in a standalone threadpool) to handle requested from upstream, when handling requests, I will publish a message to rabbitmq broker.
So I have 200 threads to publish messages to this 6 queues.
For Each queue, I will create a AMQP connection, and for each thread, I have a threadlocal of Channel so that each thread can have its own channel without synchronization since channel is not thread safe.
So, actually, I will have open 1200 channels. The requests qps is around 4000/s, it will be a bit larger in some special time.
But I found that the 200 threads are all exhausted, and most of them are in blocked state like :
DubboServerHandler-10.12.26.124:9000-thread-200 - priority:10 - threadId:0x00007f6708030800 - nativeId:0x680d - state:BLOCKED
stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at com.rabbitmq.client.impl.SocketFrameHandler.writeFrame(SocketFrameHandler.java:170)
- waiting to lock <0x0000000738ad0190> (a java.io.DataOutputStream)
at com.rabbitmq.client.impl.AMQConnection.writeFrame(AMQConnection.java:542)
at com.rabbitmq.client.impl.AMQCommand.transmit(AMQCommand.java:104)
- locked <0x000000074e085338> (a com.rabbitmq.client.impl.CommandAssembler)
at com.rabbitmq.client.impl.AMQChannel.quiescingTransmit(AMQChannel.java:337)
- locked <0x000000074656eeb0> (a java.lang.Object)
at com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:313)
- locked <0x000000074656eeb0> (a java.lang.Object)
at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:686)
at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:668)
at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:658)
at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.basicPublish(AutorecoveringChannel.java:192)
This is my jstack report: http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTgvMDIvMTEvLS0yNjE3OS50eHQtLTMtNTMtMzg=
My question is:
1.Why I have different channels to publish but they are all trying acquire the same lock
2.What will be the cause for this since this only happens tens of times in a day
3. Do I use a poor implementations for this? How can I improve it.