Azure worker role instance got stuck

Question

I have an continuous running Worker role that executes multiple jobs. The jobs are there to process queue messages. Normally if there is an exception or any problem, the job will fail, the queued message will go back into the queue, and the job will try to reprocess.

But I am facing a weird issue since last month that no messages had processed in the past day or so. I investigated on the Azure Portal, and saw that the worker role instance still had a "running" status. For some reason, the job did not time out or quit, but all the messages was sitting in the queue, unprocessed.

There were also no logs or exceptions/errors thrown (I have a decent amount of logging and exception handling in the method).

I restarted the worker role via the Azure Portal, and once that happened, all of the backed up queue messages began processing immediately.

Can anyone help with the solutions or suggestions to handle this case?

kwill kwill · Accepted Answer · 2019-09-21T19:52:25

RDP to the VM and troubleshoot it just like you would troubleshoot it on-prem. What do performance counters show you? Is your process (or any other) consuming CPU? Anything in the event logs? Take a hang dump of WaWorkerHost.exe and check the callstacks to see what your code is doing or if it is stuck in something like a deadlock or infinite loop.

You can also check the guest agent and host boostrapper logs (see https://blogs.msdn.microsoft.com/kwill/2013/08/09/windows-azure-paas-compute-diagnostics-data/), but since you said the portal was reporting that the instance was in the Ready state then I don't think you will find anything there. It sounds like 'Azure' (the role host processes) are working fine and it is something within WaWorkerHost.exe (your code) that is the problem.

Azure worker role instance got stuck

1 Answers