3
votes

When running locally in the emulator the web worker works fine. However whenever I update my web worker running on an Azure VM I get the following exceptions exceptions in the event viewer and the role won't start:

Application: WaWorkerHost.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AggregateException
Stack: at System.Threading.Tasks.Task.Wait(Int32, System.Threading.CancellationToken) at System.Threading.Tasks.Task.Wait()
at Foo.PushProcess.WorkerRole.Run()
at Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.StartRoleInternal() at Microsoft.WindowsAzure.ServiceRuntime.Implementation.Loader.RoleRuntimeBridge.b__2() at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Threading.ThreadHelper.ThreadStart()

Inner Exception: A task was canceled.

Faulting application name: WaWorkerHost.exe, version: 2.6.1198.712, time stamp: 0x54eba731
Faulting module name: KERNELBASE.dll, version: 6.3.9600.17415, time stamp: 0x54505737
Exception code: 0xe0434352
Fault offset: 0x0000000000008b9c
Faulting process id: 0xfb8
Faulting application start time: 0x01d11e3128981a5d
Faulting application path: E:\base\x64\WaWorkerHost.exe
Faulting module path: D:\Windows\system32\KERNELBASE.dll
Report Id: 30631c5c-8a25-11e5-80c6-000d3a22f3ec
Faulting package full name:
Faulting package-relative application ID:

Session "MA_ETWSESSION_WAD_415df88f8a0447178dbd4c18f1349f0e_Foo.PushProcess_Foo.PushProcess_IN_0" failed to start with the following error: 0xC0000035

This is the relevant code:

public override void Run()
{
    Trace.TraceInformation("Foo.PushProcess is running");

    try
    {
        RunAsync(_cancellationTokenSource.Token).Wait(); // This is where the exceptions point to
    }
    catch (Exception ex)
    {
        Trace.TraceError("[WORKER] Run error: " + ex);
    }
    finally
    {
        _runCompleteEvent.Set();
    }
}

public override bool OnStart()
{
    // Set the maximum number of concurrent connections
    ServicePointManager.DefaultConnectionLimit = 12;

    // For information on handling configuration changes
    // see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.

    bool result = base.OnStart();

    _storageAccount = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("StorageConnectionString"));
    var queueClient = _storageAccount.CreateCloudQueueClient();
    _pushQueue = queueClient.GetQueueReference("pushes");
    _pushQueue.CreateIfNotExists();

    CreatePushBroker();

    Trace.TraceInformation("Foo.PushProcess has been started");

    return result;
}

private async Task RunAsync(CancellationToken cancellationToken)
{
    while (!cancellationToken.IsCancellationRequested)
    {
        Trace.TraceInformation("Working");
        CloudQueueMessage message = null;
        try
        {
            message = _pushQueue.GetMessage();
            if (message != null)
            {
                ProcessItem(message);
            }
        }
        catch (Exception ex)
        {
            if (message != null && message.DequeueCount > 5)
                _pushQueue.DeleteMessage(message);

            Trace.TraceError("[WORKER] Retrieval Failure: " + ex);
        }

        await Task.Delay(1000, cancellationToken);
    }
}

Note some code has been omitted, however that is all run after the initialisation and in theory isn't reached by this exception.

I am completely at a loss as to what could cause this issue. Any help would be appreciated - even if only to aid me getting a helpful exception.

UPDATE

I have now reduced my code to the below - it is as simple as a web worker can possibly be - and I am still getting the exceptions. I believe that either the old worker is being cached, or there is an issue in the deployment procedure.

public override void Run()
{
    Trace.TraceInformation("Foo.PushProcess is running");

    try
    {
        RunAsync(_cancellationTokenSource.Token).Wait(); // This is where the exceptions point to
    }
    catch (Exception ex)
    {
        Trace.TraceError("[WORKER] Run error: " + ex);
    }
    finally
    {
        _runCompleteEvent.Set();
    }
}

public override bool OnStart()
{
    // Set the maximum number of concurrent connections
    ServicePointManager.DefaultConnectionLimit = 12;

    // For information on handling configuration changes
    // see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.

    bool result = base.OnStart();

    return result;
}

private async Task RunAsync(CancellationToken cancellationToken)
{
    while (!cancellationToken.IsCancellationRequested)
    {
        Trace.TraceInformation("Working");

        // code removed for testing - no work is being done.

        await Task.Delay(1000, cancellationToken);
    }
}
2
try sending a tweet with a link to this question to @AzureSupportSten Petrov
@StenPetrov Thanks for that, I'll send it now.Rory McCrossan
This issue is now solved - I simply deleted the original VM that was running this worker role and created a new one. The code then ran on the new remote machine without any errors. Unfortunately, I am unable to explain why.Rory McCrossan

2 Answers

0
votes

I gave this a whirl and wasn't able to get this to repro on my end. I have VS 2015 Enterprise (14.0.23107.0 D14REL) from a MSDN Azure image I deployed running with .Net Fx version 4.6. I have Azure Tools and SDK 2.8 installed. I created a new Azure Cloud Service using .NET Fx 4.5.2 and I add a single worker role.

I just ran some sparse code template from yours as follows:

public class WorkerRole : RoleEntryPoint
{
    private readonly CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
    private readonly ManualResetEvent runCompleteEvent = new ManualResetEvent(false);
    private CloudQueue _pushQueue;
    private CloudStorageAccount _storageAccount;

    public override void Run()
    {
        Trace.TraceInformation("WorkerRole1 is running");

        try
        {
            this.RunAsync(this.cancellationTokenSource.Token).Wait();
        }
        catch (Exception ex)
        {
            Trace.TraceError("[WORKER] Run error: " + ex);
        }
        finally
        {
            this.runCompleteEvent.Set();
        }
    }

    public override bool OnStart()
    {
        // Set the maximum number of concurrent connections
        ServicePointManager.DefaultConnectionLimit = 12;

        // For information on handling configuration changes
        // see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.
        bool result = base.OnStart();
        _storageAccount = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("StorageConnectionString"));
        var queueClient = _storageAccount.CreateCloudQueueClient();
        _pushQueue = queueClient.GetQueueReference("pushes");
        _pushQueue.CreateIfNotExists();

        CreatePushBroker();

        Trace.TraceInformation("Foo.PushProcess has been started");

        return result;

    }

    private void CreatePushBroker()
    {
        return;
    }

    public override void OnStop()
    {
        Trace.TraceInformation("WorkerRole1 is stopping");

        this.cancellationTokenSource.Cancel();
        this.runCompleteEvent.WaitOne();

        base.OnStop();

        Trace.TraceInformation("WorkerRole1 has stopped");
    }

    private async Task RunAsync(CancellationToken cancellationToken)
    {
        // TODO: Replace the following with your own logic.
        while (!cancellationToken.IsCancellationRequested)
        {
            Trace.TraceInformation("Working");
            CloudQueueMessage message = null;
            try
            {
                message = _pushQueue.GetMessage();
                if (message != null)
                {
                    ProcessItem(message);
                }
            }
            catch (Exception ex)
            {
                if (message != null && message.DequeueCount > 5)
                    _pushQueue.DeleteMessage(message);

                Trace.TraceError("[WORKER] Retrieval Failure: " + ex);
            }

            await Task.Delay(1000, cancellationToken);

        }
    }

    private void ProcessItem(CloudQueueMessage message)
    {
        return;
    }
}

}

That runs without issue in the local emulator, and I went ahead and deployed it to West US with IntelliTrace enabled, on a small instance VM, and had n deployment issues. It is running on a WA-GUEST-OS-4.26_201511-0 guest worker role image, and I was able to RDP into the machine, and I didn't see any issues related to the code or the machine. DO you have any other binaries you might not be including in your packages, or perhaps there are some dependencies not defined properly, or storage account naming issues?

Here was the deployment log for me. As you can see, it took about 7 minutes as I had it pull storage from East US just for fun:

1:11:25 AM - Warning: There are package validation warnings. 1:11:26 AM - Checking for Remote Desktop certificate... 1:11:26 AM - Uploading Certificates... 1:11:42 AM - Applying Diagnostics extension. 1:12:24 AM - Preparing deployment for AzureCloudService1 - 11/24/2015 1:11:19 AM with Subscription ID '9a4715f5-acb8-4a18-8259-1c28b92XXXXX' using Service Management URL 'https://management.core.windows.net/'... 1:12:24 AM - Connecting... 1:12:24 AM - Verifying storage account 'ericgoleastus'... 1:12:24 AM - Uploading Package... 1:12:28 AM - Creating... 1:13:15 AM - Created Deployment ID: c5f26568707b46a3bd42466dd0bf7509. 1:13:15 AM - Instance 0 of role WorkerRole1 is creating the virtual machine 1:13:15 AM - Starting... 1:13:32 AM - Initializing... 1:14:36 AM - Instance 0 of role WorkerRole1 is starting the virtual machine 1:16:11 AM - Instance 0 of role WorkerRole1 is in an unknown state 1:16:43 AM - Instance 0 of role WorkerRole1 is busy Details: Starting role... System is initializing. [2015-11-24T01:16:08Z] 1:19:50 AM - Instance 0 of role WorkerRole1 is ready 1:19:50 AM - Created web app URL: http://quequetest.cloudapp.net/ 1:19:50 AM - Complete.

Let us know if you can get some more details possibly with IntelliTrace enabled.

Regards, Eric

0
votes

To fix this issue I simply deleted the original Cloud VM instance which held the worker role, recreated it and re-published the role. From that point it has worked absolutely fine.

I am still unable to determine what caused the error, and have had no further issues like this with any other worker role. My assumption here was that there was a configuration issue with the VM which could not be amended through code or the Azure portal.