Service Fabric: Looking for ways to balance load between services or actors inside one application

Question

We're considering using Service Fabric on-premises, fully or partially replacing our old solution built based on NServiceBus, though our knowledge about SF is yet a bit limited. What we like about NServiceBus is the out-of-the-box feature to declaratively throttle any service with the maximum amount of threads. If we have multiple services, and one of them starts hiccuping due to some external factors, we do not want other services affected by that. That "problem" service would just take the maximum amount of threads we allocate it with in its configuration, and its queue would start growing, but other services keep working fine as computer resources are still available. In Service Fabric, if we let our application create as many "problem" actors as it wants, it will lead to uncontrollable growth of the "problem" actors that will consume all server resources.

Any ideas on how with SF we can protect our resources in the situation I described? My first impression is that no such things like queuing or actors throttling mechanism are implemented in Service Fabric, and all must be made manually.

P.S. I think it should not be a rare demand for capability to somehow balance resources between different types of actors inside one application, to make them less dependent on each other in regards to consuming resources. I just can't believe there is nothing offered for that in SF.

Thanks

Peter Bons Peter Bons · Accepted Answer · 2018-07-27T09:21:25

I am not sure how you would compare NServiceBus (which is a messaging solution) with Service Fabric that is a platform for building microservices. Service Fabric is a platform that supports many different types of workload. So it makes sense it does not provide out of the box throttling of threads etc.

Also, what would you expect from Service Fabric when it comes to actors or services when it comes to resource consumption. It is up to you what you want to do and how to react. I wouldn't want SF to kill my actors or throttle service request automatically. I would expect mechanisms to notify me when it happens and those are available.

That said, SF does have a mechanism to react on load using metrics, See the docs:

Metrics are the resources that your services care about and which are provided by the nodes in the cluster. A metric is anything that you want to manage in order to improve or monitor the performance of your services. For example, you might watch memory consumption to know if your service is overloaded. Another use is to figure out whether the service could move elsewhere where memory is less constrained in order to get better performance.

Things like Memory, Disk, and CPU usage are examples of metrics. These metrics are physical metrics, resources that correspond to physical resources on the node that need to be managed. Metrics can also be (and commonly are) logical metrics. Logical metrics are things like “MyWorkQueueDepth” or "MessagesToProcess" or "TotalRecords". Logical metrics are application-defined and indirectly correspond to some physical resource consumption. Logical metrics are common because it can be hard to measure and report consumption of physical resources on a per-service basis. The complexity of measuring and reporting your own physical metrics is also why Service Fabric provides some default metrics.

You can define you're own custom metrics and have the cluster react on those by moving services to other nodes. Or you could use the Health Reporting system to issue a health event and have your application or outside process act on that.

Service Fabric: Looking for ways to balance load between services or actors inside one application

1 Answers