I'm getting a MessageLockLostException when performing a complete operation on Azure Service Bus after performing a long operation of 30 minutes to over an hour. I want this process to scale and be resilient to failures so I keep hold of the Message lock and renew it well within the default lock duration of 1 minute. However when I try to complete the message at the end, even though I can see all the lock renewals have occurred at the correct time I get a MessageLockLostException. I want to scale this up in the future however there is currently only one instance of the application and I can confirm that the message still exists on the Service Bus Subscription after it errors so the problem is definitely around the lock.
Here are the steps I take.
- Obtain a message and configure a lock
messages = await Receiver.ReceiveAsync(1, TimeSpan.FromSeconds(10)).ConfigureAwait(false);
var message = messages[0];
var messageBody = GetTypedMessageContent(message);
Messages.TryAdd(messageBody, message);
LockTimers.TryAdd(
messageBody,
new Timer(
async _ =>
{
if (Messages.TryGetValue(messageBody, out var msg))
{
await Receiver.RenewLockAsync(msg.SystemProperties.LockToken).ConfigureAwait(false);
}
},
null,
TimeSpan.FromSeconds(Config.ReceiverInfo.LockRenewalTimeThreshold),
TimeSpan.FromSeconds(Config.ReceiverInfo.LockRenewalTimeThreshold)));
Perform the long running process
Complete the message
internal async Task Complete(T message)
{
if (Messages.TryGetValue(message, out var msg))
{
await Receiver.RenewLockAsync(msg.SystemProperties.LockToken);
await Receiver.CompleteAsync(msg.SystemProperties.LockToken).ConfigureAwait(false);
}
}
The code above is a stripped down version of what's there, I removed some try catch error handling and logging we have but I can confirm that when debugging the issue I can see the timer execute on time. It's just the "CompleteAsync" that fails.
Additional Info;
- Service Bus Topic has Partitioning Enabled
- I have tried renewing it at 80% of the threshold (48 seconds), 30% of the Threshold (18 seconds) and 10% of the Threshold (6 seconds)
- I've searched around for an answer and the closest thing I found was this article but it's from 2016.
- I couldn't get it to fail in a standalone Console Application so I don't know if it's something I'm doing in my Application but I can confirm that the lock renewal occurs for the duration of the processing and returns the correct DateTime for the updated lock, I'd expect if the lock was truely lost that the CompleteAsync would fail
- I'm using the Microsoft.Azure.ServiceBus nuget package Version="4.1.3"
- My Application is Dotnet Core 3.1 and uses a Service Bus Wrapper Package which is written in Dotnet Standard 2.1
- The message completes if you don't hold onto it for a long time and occasionally completes even when you do.
Any help or advice on how I could complete my Service Bus message successfully after an hour would be great