How does Azure cloud service built-in auto-scaling works?

Question

After extended research of Azure documentation we still miss some important details of how Azure built-in cloud service auto scaling works. Our cloud service is simple ASP.NET application with single Web role. By default we deploy to 2 instances to get SLA coverage and than scale up or down by CPU usage one instance at a time. We do use startup tasks to configure IIS and define them in csdef. We do use RoleEntryPoint to specify custom warmup logic inside OnStart event. We are sure that startup tasks and OnStart is not failing with errors.

Following question are derived from my observations and intended to clarify if this is expected behavior.

When cloud service is scaled either up or down each instance that is currently there will be taken out of load balancer for some short amount of time and will not server requests. Is this true and expected?
topologyChangeDiscovery="Blast" in csdef does not change this behavior and instances are still taken out of load balancer during scale operations. Is this true and expected?
If you have N instances in cloud service and it scales to N+1 there will be some time when only N-1 instances serving requests. This time is equal to N * (time required for single instance configuration change). Is this true and expected?
Is there a way to setup auto-scaling to make sure all instances that are currently in cloud service will serve requests without interruption during scale operations? (By any means not only using Azure built-in auto-scaling)

UPDATE:

I have performed test to actually check what instances are serving requests during scale events. Simple console app polls cloud service and records what instance responds to the requests. I have added screenshots of all changes in Azure portal into log files.

Here are results: Scale up from 2 to 3 instances: https://gist.github.com/samfromlv/8029ff0b3fdb3e6bd02a#file-scaleuplog_withscreens-txt

Scale down from 3 to 2 instances: https://gist.github.com/samfromlv/8029ff0b3fdb3e6bd02a#file-scaledownlogs_with_screens-txt

Console app source code and logs format description: https://gist.github.com/samfromlv/8029ff0b3fdb3e6bd02a

Your question is very similar to this question which has a few answers discussing behavior. — David Makogon
There are no role recycling in our case (IIS process is not restarted). Question you mentioned specifically asks about what causes recycling. — samfromlv

Igorek Igorek · Accepted Answer · 2016-03-02T19:50:04

The moment you're dealing with scaling down to 1 instance or up from 1 instance, you're going to experience unpleasent results, this is because Azure takes existing "good" instance out of the load balancer.

Assuming you're only dealing with 2+ instances and never scale down below 2 instances, here are some responses based on 5yrs of running CloudMonix/AzureWatch auto-scaling services for Azure

Only if there is only 1 insance left, as mentioned above
Blast should have little effect on load balancer. But if all instances are rebooting when Topology event occurs, ensure that you dont accidentally return a 'true' for reboot during Topology change event in your Web/WorkerRole.cs
If you have N instances, where N>=2, and it scales up to N+1, it'll take about 10 minutes to get you to N+1. At no point you should have N-1 active ones in that case. There was a doc from Microsoft that explained how quickly it scales up to many instances, but it's not relevant to starting instance count, but relevant to how many new instances it spins up. I believe up to 100 new instances are guaranteed to get spinned up in either 30 or 60 minutes. Dont quote me on that.
You can use third party services, such as the one I'm affiliated with, CloudMonix. But in all cases, standard scaling issues apply when you're dealing with 1 instance and trying to either scale down to it or up from it.

HTH

How does Azure cloud service built-in auto-scaling works?

1 Answers