We have bought a lot of licenses, done a lot of testing with a lot of promissing results and are on the brink of our first release :).
But now we have hit a big bump on the road, meaning we might have to abandon the bus, if we cant explain and fix it :/.
Our Distributor suddenly have control error messages like the one below:
<?xml version="1.0"?>
<ArrayOfHeaderInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<HeaderInfo>
<Key>NServiceBus.ControlMessage</Key>
<Value>True</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.Distributor.WorkerCapacityAvailable</Key>
<Value>20</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.Distributor.WorkerStarting</Key>
<Value>True</Value>
</HeaderInfo>
<HeaderInfo>
<Key>CorrId</Key>
<Value>58dd98f5-9ac0-44fb-8604-3a0f06787a35\295075</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.ExceptionInfo.Reason</Key>
<Value>ProcessingFailed</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.ExceptionInfo.ExceptionType</Key>
<Value>System.InvalidOperationException</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.ExceptionInfo.HelpLink</Key>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.ExceptionInfo.Message</Key>
<Value>Property ResponseQueue was not retrieved when receiving the message. Ensure that the PropertyFilter is set correctly.</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.ExceptionInfo.Source</Key>
<Value>NServiceBus.Core</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.ExceptionInfo.StackTrace</Key>
<Value> at NServiceBus.Unicast.Transport.Transactional.TransactionalTransport.ProcessMessage(TransportMessage m) in c:\BuildAgent\work\nsb.master_6\src\impl\unicast\transport\NServiceBus.Unicast.Transport.Transactional\TransactionalTransport.cs:line 312
at NServiceBus.Unicast.Transport.Transactional.TransactionalTransport.ReceiveMessage() in c:\BuildAgent\work\nsb.master_6\src\impl\unicast\transport\NServiceBus.Unicast.Transport.Transactional\TransactionalTransport.cs:line 275
at NServiceBus.Utils.TransactionWrapper.RunInTransaction(Action callback, IsolationLevel isolationLevel, TimeSpan transactionTimeout) in c:\BuildAgent\work\nsb.master_6\src\utils\TransactionWrapper.cs:line 32
at NServiceBus.Unicast.Transport.Transactional.TransactionalTransport.Process() in c:\BuildAgent\work\nsb.master_6\src\impl\unicast\transport\NServiceBus.Unicast.Transport.Transactional\TransactionalTransport.cs:line 220</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.OriginalId</Key>
<Value>58dd98f5-9ac0-44fb-8604-3a0f06787a35\295075</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.FailedQ</Key>
<Value>someservice.processId.distributor.control@testservices01</Value>
</HeaderInfo>
<HeaderInfo>
<Key>NServiceBus.TimeOfFailure</Key>
<Value>2013-04-30 10:07:40:750707 Z</Value>
</HeaderInfo>
</ArrayOfHeaderInfo>
Google tells us that this might relate to some threading issues and maybe even the way that NSB is implemented using peek/receive.
The above exception relates to this file at GitHub: https://github.com/NServiceBus/NServiceBus/blob/master/src/impl/unicast/transport/NServiceBus.Unicast.Transport.Transactional/TransactionalTransport.cs
Details about our implementation:
We use a custom IManageUnitsOfWork, due to some legacy issue which means no DTC against the DB yet. I do not think that this could be the cause, but i think its worth mentioning. This is the implemenation:
public class ManagedUnitOfWorkWithDtcSuppression : IManageUnitsOfWork
{
private readonly IContainer _container;
private IUnitOfWork _unitOfWork;
private readonly TransactionScope _scope;
public ManagedUnitOfWorkWithDtcSuppression()
{
_scope = new TransactionScope(TransactionScopeOption.Suppress);
_container = ObjectFactory.GetInstance<IContainer>();
}
public void Begin()
{
_unitOfWork = _container.GetInstance<IUnitOfWork>();
}
public void End(Exception exception = null)
{
if (exception == null)
{
_unitOfWork.Commit();
}
_unitOfWork.Dispose();
_scope.Complete();
_scope.Dispose();
}
}
Also we have a special setup where we run 4 identical AppDomains inside 1 service running, which means that when we start the service as a distributor, there are actually 4 distributors running. But these are pr. definition completely isolated from one another. The IBus is unique for each AppDomain, this has been tested.
Our Distributor configuration looks like this:
return NServiceBus.Configure.With()
.DefineEndpointName(queuePrefix)
.Log4Net(ObjectFactory.GetInstance<IServiceBusLog>().Build())
.StructureMapBuilder()
.JsonSerializer()
.AsMasterNode()
.RunDistributorWithNoWorkerOnItsEndpoint()
.MsmqTransport()
.IsTransactional(true)
.DisableTimeoutManager()
.DisableSecondLevelRetries()
.UnicastBus()
.CreateBus()
.Start(() => NServiceBus.Configure.Instance.ForInstallationOn<NServiceBus.Installation.Environments.Windows>().Install());
Question:
What is going on here?
Are we screwing with NSB because we are using DTC supression, is there a MSMQ bug or is there a NSB bug?