I have a Windows 2008 R2 server that hosts many back end NServiceBus endpoints. All of the services that rely on the NServiceBus.Host.exe host (installed as Windows Services) are able to interact with MSDTC perfectly, averaging a small handful of concurrent distributed transactions throughout the day. There are 2 small Web.API applications, however, that self host NServiceBus endpoints (as publishers) that constantly receive the following error when trying to process subscription requests:
NServiceBus.Transports.Msmq.MsmqDequeueStrategy Error in receiving messages. System.Transactions.TransactionAbortedException: The transaction has aborted. ---> System.Transactions.TransactionManagerCommunicationException: Communication with the underlying transaction manager has failed. ---> System.Runtime.InteropServices.COMException: The Transaction Manager is not available. (Exception from HRESULT: 0x8004D01B) at System.Transactions.Oletx.IDtcProxyShimFactory.ConnectToProxy(String nodeName, Guid resourceManagerIdentifier, IntPtr managedIdentifier, Boolean& nodeNameMatches, UInt32& whereaboutsSize, CoTaskMemHandle& whereaboutsBuffer, IResourceManagerShim& resourceManagerShim) at System.Transactions.Oletx.DtcTransactionManager.Initialize() --- End of inner exception stack trace --- at System.Transactions.Oletx.OletxTransactionManager.ProxyException(COMException comException) at System.Transactions.Oletx.DtcTransactionManager.Initialize() at System.Transactions.Oletx.DtcTransactionManager.get_ProxyShimFactory() at System.Transactions.Oletx.OletxTransactionManager.CreateTransaction(TransactionOptions properties) at System.Transactions.TransactionStatePromoted.EnterState(InternalTransaction tx) --- End of inner exception stack trace --- at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx) at System.Transactions.Transaction.Promote() at System.Transactions.TransactionInterop.ConvertToOletxTransaction(Transaction transaction) at System.Transactions.TransactionInterop.GetDtcTransaction(Transaction transaction) at System.Messaging.MessageQueue.StaleSafeReceiveMessage(UInt32 timeout, Int32 action, MQPROPS properties, NativeOverlapped* overlapped, ReceiveCallback receiveCallback, CursorHandle cursorHandle, IntPtr transaction) at System.Messaging.MessageQueue.ReceiveCurrent(TimeSpan timeout, Int32 action, CursorHandle cursor, MessagePropertyFilter filter, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType) at System.Messaging.MessageQueue.Receive(TimeSpan timeout, MessageQueueTransactionType transactionType) at NServiceBus.Transports.Msmq.MsmqDequeueStrategy.ReceiveMessage(Func`1 receive) in c:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Transports\Msmq\MsmqDequeueStrategy.cs:line 313
Some other notes:
- Both the erroring ApplicationPools' identities and the Windows Services' Log On users are the same.
- This actually worked well before a recent reboot, as the Web.API services were able to successfully process subscription requests, and are able to publish messages just fine (though publishing does not automatically use MSDTC, and we are not using a TransactionScope explicitly). Since the local reboot, we simply get the above error if a subscription request message sits in either of the Web.API publisher's input queue.
- I've used both procmon.exe and MSDTC tracing and have found nothing of interest. The typical event viewer logs also do not provide any information.
- All endpoints are running .NET 4.5 and NServiceBus 4.6
- We cannot recreate this in any other environment.
Additional notes from below conversations
- The thread which throws the exception is pure NServiceBus subscription management where none of "my" code is involved. When the application pool starts the w3wp.exe worker process on demand, NSB is spawning a worker thread unbeknownst to the application to process subscription requests. It should only ever work across the publisher's input queue and the subscription storage, which I'm using MSMQ for that as well, in a queue right beside the other (i.e. no other server is involved to my knowledge).
- The "code" of the website didn't change across reboots, and the application pool stopped and restarted several times before the reboot without issue.