1
votes

We have bunch of owin based web api services hosted in azure service fabric cluster. All of those services have been mapped to different ports in associated load balancer. There are 2 out-of-the-box probes when cluster is created. They are: FabricGatewayProbe and FabricHttpGatewayProbe. We added our port rules and used FabricGatewayProbe in them.

For some reason, these service endpoints seem to be going to sleep after a period of inactivity because clients of those services are timing out. We tried adjusting load balancer idle time out period to 30 minutes (which is maximum). It seems to help immediately but only for a brief period and then we are back to time out errors.

Where else should I be looking for resolution of this problem?

1
maybe add an actor that periodically pings your service? also, take a look here for the service lifecycle: github.com/Azure/azure-content/blob/master/articles/…Rotem Varon
also, u can listen to the deactivation event and act on itRotem Varon
I reviewed the documentation link. If we don't have any unhandled exceptions, then the service should not be closed/aborted. In that case, do we still encounter these time out issues in your opinion?Raghu
You need to have a probe per port or the load balancer will remove the node from its pool. see attached link azure.microsoft.com/en-gb/documentation/articles/…jimpaine
I looked at the link and did not see any such requirement that one probe per port is required. Did I miss something in this article?Raghu

1 Answers

3
votes

So further to our comments I agree that the documentation is open to interpretation, but after doing some testing I can confirm the following:

When creating a new cluster via the portal it will give you a 1:1 relation of rule to probe and I have also been able to reproduce your issue when modifying one of my existing ARM templates to use the same existing probe as you have.

On reflection this makes sense as a probe is effectively being bound to a service, if you attempt to share a probe for rules on different ports how will the load balancer know if one of the services is actually up, also Service Fabric (depending on your instance count settings) will move the services between nodes.

So if you had two services on different ports using the same probe on different nodes the service not using the port from the probe will receive the error that the request took too long to respond.

A little long winded so hopefully a quick illustration will help show what I mean.

enter image description here