How to restart Service Fabric scale set machines

Question

We have a service fabric cluster with one scale set (primary) with 5 nodes. There was a memory leak in one of our services which drained all of the available memory on the nodes and eventually other services failed. For instance some Powershell commands don't work now. In the Service Fabric Explorer everything is healthy and we don't have any errors or warnings. Is it possible to restart the machines and what is the best way to do it so we could restore the machines to their initial state where all of the services are working?

In the scale set when scaling down it removes the node with the highest index, so it won't help to follow the documentation, scale up and then remove the nodes that are faulty.

What would happen if we restart the scale set nodes one buy one? I see that service fabric handles it - disables the node and activates it afterwards. But from the documentation in silver tier we need to have 5 nodes up and running all the time. So before restarting any of the nodes should we scale up, add one more node and then proceed with the restart?

Diego Mendes Diego Mendes · Accepted Answer · 2018-10-04T23:58:51

If the failing nodes has healthy services still running, the best approach is disable the node first with Disable-ServiceFabricNode command, so that any healthy services are moved out of the node with less impact possible.

Once the services are moved, in some cases, just a Restart-ServiceFabricNode command can kill all locked services and come back healthy, without actually restaring the VM.

In last case, you might need to restart the VM via Powershell or Azure Portal to get a fresh start to the node.

If your cluster is running on high density load, you might need to scale up first to bring capacity to the cluster reallocate the services.

How to restart Service Fabric scale set machines

1 Answers