2
votes

I have a Stateful service fabric application. In the application Parameter file i have PartitionCount as 20

When i deploy service to the cluster I get 20 partitions and some of the partition status is showing as "Reconfiguring" and finally they are going to warning state showing Unhealthy evnet

Unhealthy event: SourceId='System.FM', Property='State', HealthState='Warning', ConsiderWarningAsError=false. Partition reconfiguration is taking longer than expected.

But the replica health inside that Partition is showing as "OK"

What is actually happening when partition is in "Reconfiguring" state ? Why this error occurs?

1
Are there any messages from ETW? To try and diagnose and debug, run on your local development cluster with just 1 partition and make sure that works. Then try locally with 2 partitions and make sure it works ok. If this is the case, it could be that 20 partitions may be too many for the cluster.Nick Randell

1 Answers

4
votes

Reconfiguration of a stateful service is when Service Fabric is shuffling replicas around the cluster. This occurs any time the system needs to make a change to replica placement, which can be fail-over to ensure availability of replicas during machine down time or upgrades, or for resource balancing to ensure workloads are balanced across the cluster - the latter occurs immediately when you deploy a new service, as the system has to find a place to put the replicas and then balance everything out.

If reconfiguration is taking longer then expected, there's a good chance a replica is either not responding to a change role or close action (e.g., your service code is not responding to the cancellation token in RunAsync), or a replica is failing to start (e.g., your communication listener code throws an exception on OpenAsync).