1
votes

I found a problem in my prod env.

We have 6 queues in a mq cluster, and we have thread pool with 200 threads(actually it will be more, since it will schedule some special task in a standalone threadpool) to handle requested from upstream, when handling requests, I will publish a message to rabbitmq broker.

So I have 200 threads to publish messages to this 6 queues.

For Each queue, I will create a AMQP connection, and for each thread, I have a threadlocal of Channel so that each thread can have its own channel without synchronization since channel is not thread safe.

So, actually, I will have open 1200 channels. The requests qps is around 4000/s, it will be a bit larger in some special time.

But I found that the 200 threads are all exhausted, and most of them are in blocked state like :

    DubboServerHandler-10.12.26.124:9000-thread-200 - priority:10 - threadId:0x00007f6708030800 - nativeId:0x680d - state:BLOCKED
    stackTrace:
    java.lang.Thread.State: BLOCKED (on object monitor)
    at com.rabbitmq.client.impl.SocketFrameHandler.writeFrame(SocketFrameHandler.java:170)
    - waiting to lock <0x0000000738ad0190> (a java.io.DataOutputStream)
    at com.rabbitmq.client.impl.AMQConnection.writeFrame(AMQConnection.java:542)
    at com.rabbitmq.client.impl.AMQCommand.transmit(AMQCommand.java:104)
    - locked <0x000000074e085338> (a com.rabbitmq.client.impl.CommandAssembler)
    at com.rabbitmq.client.impl.AMQChannel.quiescingTransmit(AMQChannel.java:337)
    - locked <0x000000074656eeb0> (a java.lang.Object)
    at com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:313)
    - locked <0x000000074656eeb0> (a java.lang.Object)
    at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:686)
    at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:668)
    at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:658)
    at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.basicPublish(AutorecoveringChannel.java:192)

This is my jstack report: http://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTgvMDIvMTEvLS0yNjE3OS50eHQtLTMtNTMtMzg=

My question is:

  1.Why I have different channels to publish but they are all trying acquire the same lock
  2.What will be the cause for this since this only happens tens of times in a day
  3. Do I use a poor implementations for this? How can I improve it.
1
Jaskey - I recommend that you mark the answer provided by "ilooner" as correct and continue the discussion on the RabbitMQ mailing list here - groups.google.com/d/topic/rabbitmq-users/15cv2qroCps/discussionLuke Bakken

1 Answers

2
votes
  1. Looking at the source code each connection has a SocketFrameHandler. The socket frame handler synchronizes on the output stream in the writeFrame method here https://github.com/rabbitmq/rabbitmq-java-client/blob/master/src/main/java/com/rabbitmq/client/impl/SocketFrameHandler.java . Which basically means if you have 200 channels (one for each thread) sharing a connection, only one thread will be able to send data at a time due to the synchronization in the write method.
  2. I'm not sure. How many messages per second do you typically send to rabbitmq? Are there periods throughout the day where you send very few messages and other periods where you send many messages?
  3. Because of the synchronization in the SocketFrameHandler I don't see any benefit to having more than one thread and channel per connection. Try refactoring your application so that any data that needs to be sent is fed into an in memory queue and one thread is responsible for reading from the queue and sending the data to rabbitmq. This way you can have many threads doing work and producing data, while one thread is responsible for sending it.