2
votes

I have a Windows 2008 R2 server that hosts many back end NServiceBus endpoints. All of the services that rely on the NServiceBus.Host.exe host (installed as Windows Services) are able to interact with MSDTC perfectly, averaging a small handful of concurrent distributed transactions throughout the day. There are 2 small Web.API applications, however, that self host NServiceBus endpoints (as publishers) that constantly receive the following error when trying to process subscription requests:

NServiceBus.Transports.Msmq.MsmqDequeueStrategy Error in receiving messages. System.Transactions.TransactionAbortedException: The transaction has aborted. ---> System.Transactions.TransactionManagerCommunicationException: Communication with the underlying transaction manager has failed. ---> System.Runtime.InteropServices.COMException: The Transaction Manager is not available. (Exception from HRESULT: 0x8004D01B) at System.Transactions.Oletx.IDtcProxyShimFactory.ConnectToProxy(String nodeName, Guid resourceManagerIdentifier, IntPtr managedIdentifier, Boolean& nodeNameMatches, UInt32& whereaboutsSize, CoTaskMemHandle& whereaboutsBuffer, IResourceManagerShim& resourceManagerShim) at System.Transactions.Oletx.DtcTransactionManager.Initialize() --- End of inner exception stack trace --- at System.Transactions.Oletx.OletxTransactionManager.ProxyException(COMException comException) at System.Transactions.Oletx.DtcTransactionManager.Initialize() at System.Transactions.Oletx.DtcTransactionManager.get_ProxyShimFactory() at System.Transactions.Oletx.OletxTransactionManager.CreateTransaction(TransactionOptions properties) at System.Transactions.TransactionStatePromoted.EnterState(InternalTransaction tx) --- End of inner exception stack trace --- at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx) at System.Transactions.Transaction.Promote() at System.Transactions.TransactionInterop.ConvertToOletxTransaction(Transaction transaction) at System.Transactions.TransactionInterop.GetDtcTransaction(Transaction transaction) at System.Messaging.MessageQueue.StaleSafeReceiveMessage(UInt32 timeout, Int32 action, MQPROPS properties, NativeOverlapped* overlapped, ReceiveCallback receiveCallback, CursorHandle cursorHandle, IntPtr transaction) at System.Messaging.MessageQueue.ReceiveCurrent(TimeSpan timeout, Int32 action, CursorHandle cursor, MessagePropertyFilter filter, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType) at System.Messaging.MessageQueue.Receive(TimeSpan timeout, MessageQueueTransactionType transactionType) at NServiceBus.Transports.Msmq.MsmqDequeueStrategy.ReceiveMessage(Func`1 receive) in c:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Transports\Msmq\MsmqDequeueStrategy.cs:line 313

Some other notes:

  • Both the erroring ApplicationPools' identities and the Windows Services' Log On users are the same.
  • This actually worked well before a recent reboot, as the Web.API services were able to successfully process subscription requests, and are able to publish messages just fine (though publishing does not automatically use MSDTC, and we are not using a TransactionScope explicitly). Since the local reboot, we simply get the above error if a subscription request message sits in either of the Web.API publisher's input queue.
  • I've used both procmon.exe and MSDTC tracing and have found nothing of interest. The typical event viewer logs also do not provide any information.
  • All endpoints are running .NET 4.5 and NServiceBus 4.6
  • We cannot recreate this in any other environment.

Additional notes from below conversations

  • The thread which throws the exception is pure NServiceBus subscription management where none of "my" code is involved. When the application pool starts the w3wp.exe worker process on demand, NSB is spawning a worker thread unbeknownst to the application to process subscription requests. It should only ever work across the publisher's input queue and the subscription storage, which I'm using MSMQ for that as well, in a queue right beside the other (i.e. no other server is involved to my knowledge).
  • The "code" of the website didn't change across reboots, and the application pool stopped and restarted several times before the reboot without issue.
2
Is this still an issue for you? I've had the same problem because (supposedly) magically our DTC no long had network access. I know this question is over 3 months old but I'm curious to see what the resolution was. My problem was fixed via: technet.microsoft.com/en-us/library/Cc753510(v=WS.10).aspxJustin Self

2 Answers

2
votes

Not really an answer, but too long for a comment.

What part of your operation requires DTC? A Distributed Transaction gets enlisted automatically when needed, usually when you are talking to two different DTC-supporting bits of infrastructure (e.g. MSMQ and a database).

You said you tested via DTC tracing--do you mean DTC Ping? Did you test by having it run on both machines (or all machines if there are more than two involved in the transaction)? The DTC tool is pretty esoteric, and its output can be confusing.

Also, if it did work before the reboot, is it possible the reboot reset firewall settings? Firewalls are a common cause of DTC problems.

Also, I assume you checked and rechecked your DTC settings on the local machine? Did you ensure that your MSMQ queues are set up to be transactional?

From your comments:

Note that this particular failure occurs when attempting to dequeue a message from a local private MSMQ queue [...]

The stack trace makes it appear that that's all it's doing, but I suspect that as it is attempting dequeue it is also trying to enlist the transaction between multiple servers. See below.

Why MSDTC? It's the original way to support exactly-once messaging in NServiceBus (see here).

Right, but what I'm asking is why the particular operation requires a distributed transaction. If all a handler is doing is reading from a queue and (for example) writing output to the console, MSDTC will never be enlisted, even though the handler is wrapped in a transaction scope. It will simply use a local transaction to read from the queue. The escalation to a distributed transaction is automatic, and only happens when it is needed to support multiple bits of infrastructure.

So if you recently deployed code in a handler that writes data to a new database server, you may be getting a failure because you are now enlisting a transaction that includes the new server, which may be where the failure is happening.

So determining all the servers involved in the distributed transaction is the first step. The next step would be to check the DTC settings on all involved servers. If DTC settings aren't the problem, I'd recommend testing communication between the servers using DTCPing. The NServiceBus documentation has some good instructions for using DTCPing.

1
votes

What "fixed" this for us in the production environment was adding the application pool identity user to the local Administrators group on the server. Unfortunately we don't have time to determine what setting required that security setup, as this isn't a required configuration in other similar servers. Also, this isn't the most desirable solution from a security perspective, but in our particular situation, we're willing to live with it.