0
votes

Given a deployment to an Azure Cloud Services WebRoles (2) using Azure SDK 3.0 on .net 4.5.2 and OS Family "4" (Windows 2012).

When the web application starts, we want to load a cache (from blob storage) that takes around 10 minutes (We have looked into moving this but currently can't)

Then when the IIS application pools recycles, we want the site to stay up.

Currently the default IIS settings with Cloud Services are:

  • Not to start on load (autoStart / startMode)
  • To idle every 20 minutes (idleTimeout)
  • To recycle every 29 hours (periodicRestart)
  • to have failures as HTTP 503s (loadBalancerCapabilities)

Because we default to 2 WebHost, we want to recycle the app pool at different times. We ideally want existing connection from the site to be redirected if one of the webhosts is loading the cache.

So far, we have a Start up task script to reconfigure the IIS AppPools

appcmd set config -section:system.applicationHost/applicationPools 

with

  /applicationPoolDefaults.autoStart:"True"
  /applicationPoolDefaults.startMode:"AlwaysRunning"
  /applicationPoolDefaults.processModel.idleTimeout:"00:00:00" 
  /applicationPoolDefaults.recycling.logEventOnRecycle:"Time,Requests,Schedule,Memory,IsapiUnhealthy,OnDemand,ConfigChange,PrivateMemory"
  /applicationPoolDefaults.recycling.periodicRestart.time:"00:00:00" 
  /~"applicationPoolDefaults.recycling.periodicRestart.schedule" 
  /+"applicationPoolDefaults.recycling.periodicRestart.schedule.[value='06:00:00']" 
  /applicationPoolDefaults.failure.loadBalancerCapabilities:"TcpLevel" 

e.g

%windir%\system32\inetsrv\appcmd set config -section:applicationPools /applicationPoolDefaults.autoStart:"True" /commit:apphost

As for code, we have looked at using a Busy flag until the cache has loaded. This doesn't appear to re-route the traffic

RoleEnvironment.StatusCheck += WebRoleEnvironment_StatusCheck;

with

        if (Busy)
        {
            e.SetBusy();
        }

The draw back is this is done in the Application_Start due to the containers that are required. I think it would be too hard to move the LoadCache() into the OnStart() of the RoleEntryPoint.

Note; We also have "Keep-alive" on by default.

Questions;

  1. How do we take a WebHost offline while it loads the cache?
  2. Should we change the IIS settings? https://azure.microsoft.com/en-gb/blog/iis-reset-on-windows-azure-web-role/
  3. Should we use IIS 8.0 Application Initialization? http://fabriccontroller.net/iis-8-0-application-initialization-module-in-a-windows-azure-web-role/
  4. What should loadBalancerCapabilities be set to? https://docs.microsoft.com/en-us/iis/configuration/system.applicationhost/applicationpools/add/failure
  5. Should we try to stagger recycles? What about when we scale (add more instances) Does azure prevent that role instances are recycled at the same time?
3

3 Answers

1
votes

See https://blogs.msdn.microsoft.com/kwill/2012/09/19/role-instance-restarts-due-to-os-upgrades/, specifically Common Issues #5:

If your website takes several minutes to warmup (either standard IIS/ASP.NET warmup of precompilation and module loading, or warming up a cache or other app specific tasks) then your clients may experience an outage or random timeouts. After a role instance restarts and your OnStart code completes then your role instance will be put back in the load balancer rotation and will begin receiving incoming requests. If your website is still warming up then all of those incoming requests will queue up and time out. If you only have 2 instances of your web role then IN_0, which is still warming up, will be taking 100% of the incoming requests while IN_1 is being restarted for the Guest OS update. This can lead to a complete outage of your service until your website is finished warming up on both instances. It is recommended to keep your instance in OnStart, which will keep it in the Busy state where it won't receive incoming requests from the load balancer, until your warmup is complete. You can use the following code to accomplish this:

 public class WebRole : RoleEntryPoint {  
   public override bool OnStart () {  
     // For information on handling configuration changes  
     // see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.  
     IPHostEntry ipEntry = Dns.GetHostEntry (Dns.GetHostName ());  
     string ip = null;  
     foreach (IPAddress ipaddress in ipEntry.AddressList) {  
       if (ipaddress.AddressFamily.ToString () == "InterNetwork") {  
         ip = ipaddress.ToString ();  
       }  
     }  
     string urlToPing = "http://" + ip;  
     HttpWebRequest req = HttpWebRequest.Create (urlToPing) as HttpWebRequest;  
     WebResponse resp = req.GetResponse ();  
     return base.OnStart ();  
   }  
 }  
0
votes

According to your description, based on my understanding and experience, I think it's almost impossible to satisfy all of your needs in the current scenario, which need to make changes at the architecture.

Here is my idea as below.

  1. I guess that the cache blob file is too big which caused taking more time to load cache from blob storage. So for reducing the time cost. I think the solution is that to split the cache blob file by usage statistics into many smaller ones and load them concurrently, or to use table storage instead of blob storage as L2 cache, just query the cache data from table storage and store it into memory as L1 cache with an expiration time, even you can use Azure Redis Cache to store your cache data which is faster than table storage.
  2. Make sure that there is a retry mechanism for keep-alive connection. Then an existing connection will be redirected to another role instance when a role instance be stoped or restarted.
  3. To implement a feature for restarting a role instance, there is a REST API Reboot Role Instance which can do it.

Hope it helps.

0
votes

This is what we have ended up with:

EDIT: Changed to a HttpWebRequest so redirects are supported

a) When a VM is deployed / OS patched we poll the httpsIn endpoint within the OnStart()

public class WebRole : RoleEntryPoint
{
    public override bool OnStart()
    {
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;

        // Note: the Web Requests all run in IIS, not from this process.
        // So, we aren't disabling certs globally, just for checks against our own endpoint.
        ServicePointManager.ServerCertificateValidationCallback += (o, certificate, chain, errors) => true;

        var address = GetAddress("httpIn");

        var request = (HttpWebRequest)WebRequest.Create(address);
        request.MaximumAutomaticRedirections = 1;
        request.AllowAutoRedirect = false;
        var response = request.GetResponse() as HttpWebResponse;
        //_logger.WriteEventLog($"Response: '{response?.StatusCode}'");
        return base.OnStart();
    }

    static Uri GetAddress(string endpointName)
    {
        var endpoint = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints[endpointName];
        var address = $"{endpoint.Protocol}://{endpoint.IPEndpoint.Address}:{endpoint.IPEndpoint.Port}";
        return new Uri(address);
    }
}

b) For AppPool Recycles, we report Busy in the Global.asax

public class RoleEnvironmentReadyCheck
{
    bool _isBusy = true;

    public RoleEnvironmentReadyCheck()
    {
        RoleEnvironment.StatusCheck += RoleEnvironment_StatusCheck;
    }

    void RoleEnvironment_StatusCheck(object sender, RoleInstanceStatusCheckEventArgs e)
    {
        if (_isBusy)
        {
            e.SetBusy();
        }
    }

    public void SetReady()
    {
        _isBusy = false;
    }
}

public class WebApiApplication : HttpApplication
{
    protected void Application_Start()
    {
        var roleStatusCheck = new RoleEnvironmentReadyCheck();
        //SuperLoadCache()
        roleStatusCheck.SetReady();
    }
}

c) For the AppPool recycles, we select a time of day (03:00AM) and stagger the Roles by 30mins and stop the idle timeout in a PowerShell Script ConfigureIIS.ps1

$InstanceId = $env:INSTANCEID
$role = ($InstanceId -split '_')[-1]
$roleId = [int]$role
$gapInMinutes = 30
$startTime = New-TimeSpan -Hours 3
$offset = New-TimeSpan -Minutes ($gapInMinutes * $roleId)
$time = $startTime + $offset
$timeInDay = "{0:hh\:mm\:ss}" -f $time

Write-Host "ConfigureIIS with role: $role to $timeInDay"

& $env:windir\system32\inetsrv\appcmd set config -section:system.applicationHost/applicationPools /applicationPoolDefaults.processModel.idleTimeout:"00:00:00" /commit:apphost
& $env:windir\system32\inetsrv\appcmd set config -section:system.applicationHost/applicationPools /applicationPoolDefaults.recycling.logEventOnRecycle:"Time,Requests,Schedule,Memory,IsapiUnhealthy,OnDemand,ConfigChange,PrivateMemory" /commit:apphost
& $env:windir\system32\inetsrv\appcmd set config -section:system.applicationHost/applicationPools /applicationPoolDefaults.recycling.periodicRestart.time:"00:00:00" /commit:apphost
& $env:windir\system32\inetsrv\appcmd set config -section:system.applicationHost/applicationPools /~"applicationPoolDefaults.recycling.periodicRestart.schedule" /commit:apphost
& $env:windir\system32\inetsrv\appcmd set config -section:system.applicationHost/applicationPools /+"applicationPoolDefaults.recycling.periodicRestart.schedule.[value='$timeInDay']" /commit:apphost

And pass the RoleId to the ConfigureIIS.cmd

PowerShell -ExecutionPolicy Unrestricted .\ConfigureIIS.ps1 >> "%TEMP%\StartupLog.txt" 2>&1

EXIT /B 0

Set within the ServiceDefinition.csdef

 <Task commandLine="ConfigureIIS.cmd" executionContext="elevated" taskType="simple">
    <Environment>
      <Variable name="INSTANCEID">
        <RoleInstanceValue xpath="/RoleEnvironment/CurrentInstance/@id"/>
      </Variable>
    </Environment>
  </Task>