WebSphere MQ: Message keeps toggling between input queue and backout queue

Question

The logic flow is like this

A message is sent to an input queue
A ProcessorMDB's onMessage() is invoked. Within this method several operations/validations are done
In case of a poison message(msg that application code cannot handle) a RuntimeException is thrown.
This should rollback the transaction. We are seeing evidence in the log file.
There is a backout threshold defined with a backout queue name
once threshold is reached, the message is sent to backout queue
But immediately it starts going back and forth between the input queue and backout queue.
We are using MQMON tool to observe this weird behavior. It continues for ever almost even after the app server(where MDB is running) is shutdown.
We are using Weblogic 10.3.1 and WebSphere MQ 6.02

Any help will be much appreciated, looks like we are running out of ideas.

Perhaps you can post some code that handles the rollback of the transaction? When viewing the message that is seen bouncing back and forth can you verify that it is the exact same message with the exact same header properties (ie. redelivery count and such)? Also, are you using your own backout queue for poison messages or are you using a system dlq? — gregwhitaker
@gwhitake: No, we have defined a "ErrorQ" for the backout queue. There is no code to handle the rollback transaction, we are using MDBs with CMT. As I stated, we simply throw a runtime exception if an error is encountered in processing. We are unable to verify the message headers since it keeps jumping between queues and we simply cannot catch hold of it. ie by the time we refresh or want to view the details the message would have gone to the input Q. It's a weird behavior and that is why reaching out — SCS075

T.Rob T.Rob · Accepted Answer · 2011-05-22T02:53:03

This sounds like a syncpoint issue. If the QMgr were to issue a COMMIT when a message is requeued inside of a unit of work it would affect all messages under syncpoint inside of that thread. This would cause serious problems if an application had performed several PUT or GET calls prior to hitting the poison message. Rather than issue a COMMIT outside of the program's control, the QMgr just leaves the message on the backout queue inside the unit of work and waits for the program to issue the COMMIT. This can lead to some unexpected behavior such as what you are seeing where a message lands back on the input queue.

If another message is in the queue behind the "bad" one and it is processed successfully by the same thread, everything works out perfectly. The app issues a COMMIT on the new message and this also affects the poison message on the Backout Queue. However if the thread were to exit uncleanly (without an explicit disconnect or COMMIT) then the transaction is rolled back and the poison message is returned to the input queue.

The usual way of dealing with this is that the next good message (or batch of messages if transactions are batched) in the input queue will force the COMMIT. However in some cases where the owning thread gets no new work (perhaps it was performing a GET by Correlation ID) there is nothing to push the bad message through. In these cases, it is important to make sure that the application issues a COMMIT before ending. One way to do this is to write the code to perform the GET by CORRELID with a wait interval. If the wait interval expires, the application would get a return code of 2033 and then issue a COMMIT before closing the thread. If the reply message is legitimately late for whatever reason, the COMMIT will have no effect. But if the message arrived and had been backed out and requeued, the COMMIT will cause it to stay in the Backout Queue.

One way to see exactly what is going on is to run a trace against the queue in question. You can use the built-in trace function - strmqtrc - which has a few more options in V7 than does the V6 version. However if you want very fine grained control you can use the trace exit in SupportPac MA0W. With MA0W you can see exactly what API calls are made by the program and those made on its behalf.

[EDIT] Updating the response with some info from the PMR:

The following is from the WMQ V7 Infocenter:

MessageConsumers are single threaded below the Session level, and any requeuing of poison messages takes place within the current unit of work. This does not affect the operation of the application, however when poison messages are requeued under a transacted or Client_acknowledge Session, the requeue action itself will not be committed until the current unit of work is committed by the application code or, if appropriate, the application container code."

Hence, if it is important for the customer to have poison messages committed immediately after they are backed out, it is recommended they either make use of the Application Server Facilities (ConnectionConsumer) which can commit the message immediately, or another mechanism to move poison messages from the queue.

Here is the link to this information in the V6 and V7 Information Centers. Since you are using the V6 client so you would want to refer to the V6 Infocenter. Note that with the V6 client, there is no mention in the Infocenter of ASF being able to commit the poison message immediately, even when using a ConnectionConsumer. The way I read it, this means you probably will need to upgrade to the V7 client to get the behavior you are looking for. Will be interested to see if the PMR results in a similar recommendation.

WebSphere MQ: Message keeps toggling between input queue and backout queue

1 Answers