2
votes

We are using Azure Service Fabric (Stateless Service) which gets messages from the Azure Service Bus Message Queue and processes them. The tasks generally take between 5 mins and 5 hours.

When its busy we want to scale out servers, and when it gets quiet we want to scale back in again.

How do we scale in without interrupting long running tasks? Is there a way we can tell Service Fabric which server is free to scale in?

1
As far as I know, SF doesn't scale automatically and you have to update your cluster to scale out by adding more VMs.Sean Feldman
Pretty sure you can do it via Virtual machine scale set?tank104
That's what I meant. I'm just not sure there's something that would add/remove VMs to/from VMSS based on a status of a service. Saying that, if you have a stateless service OOTB.Sean Feldman
Ah right :) What I was hoping is that there is some way you could select which node to shutdown. So right now VMSS would just scale down one node even if work is still happening on it? (Which is what I thought/was worried about)tank104
One thing I thought of - is it possible to scale down the node which has the lowest CPU usage?tank104

1 Answers

1
votes
  1. Azure Monitor Custom Metric

    • Integrate your SF service with EventFlow. For instance, make it sending logs into Application Insights

    • While your task is being processed, send some logs in that will indicate that it's in progress

    • Configure custom metric in Azure Monitor to scale in only in case on absence of the logs indicating that machine has in-progress tasks

The trade-off here is to wait for all the events finished until the scale-in could happen.

  1. There is a good article that explains how to Scale a Service Fabric cluster programmatically
  2. Here is another approach which requires a bit of coding - Automate manual scaling

    • Develop another service either as part of SF application or as VM extension. The point here is to make the service running on all the nodes in a cluster and track the status of tasks execution.

    • There are well-defined steps how one could manually exclude SF node from the cluster -

    • Run Disable-ServiceFabricNode with intent ‘RemoveNode’ to disable the node you’re going to remove (the highest instance in that node type).

    • Run Get-ServiceFabricNode to make sure that the node has indeed transitioned to disabled. If not, wait until the node is disabled. You cannot hurry this step.
    • Follow the sample/instructions in the quick start template gallery to change the number of VMs by one in that Nodetype. The instance removed is the highest VM instance.
    • And so forth... Find more info here Scale a Service Fabric cluster in or out using auto-scale rules. The takeaway here is that these steps could be automated.

Implement scaling logic in a new service to monitor which nodes are finished with their tasks and stay idle to scale them in using instructions described in previous steps.

Hopefully it makes sense.


Thanks a lot to @tank104 for the help on elaborating my answer!