I have a Service Fabric cluster hosting an 'Orchestrator'-type service which spins up and shuts down other Stateful services to do work, using FabricClient.ServiceManagementClient
's CreateServiceAsync and DeleteServiceAsync methods.
The work involves processing messages which are stored for a short time within a ReliableConcurrentQueue
.
I'm trying to handle the graceful shutdown of these services via the CancellationToken
by ensuring that the queue is completely drained of messages before the service is deleted, but have found that the service's access to the ReliableConcurrentQueue
is revoked once the CancellationToken
is cancelled.
For example, calling StateManager.GetOrAddAsync<T>()
from a callback registered with the CancellationToken, results in a FabricNotReadableException, containing the message "Primary state manager is currently not readable".
Reading around, it seems this is expected behaviour:
"In Service Fabric, when a Primary is demoted, one of the first things that happens is that write access to the underlying state is revoked."
https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-services-lifecycle
Also, the answers to this question suggest that FabricNotReadableException is often a transient issue, and affected calls can be retried. This doesn't seem to be the case in this example; multiple retries at various frequencies/delays all seem to fail the same way.
Is there a way to guarantee that everything in the queue is processed using the combination of Stateful services, Reliable Collections and CancellationTokens? Or should I be looking into storage outside of what Service Fabric can provide?