2
votes

I have a "Triggered" Webjob in Azure that is stuck on "Running" It has been in this state for two days, it is scheduled to run every 5 minutes.

This Webjob has previously worked without any problem. This problem seems to have occurred on a Saturday (my timezone), which means that no one was playing with it when it broke.

What I have tried so far:

  1. Restarting the site the Webjob is attached to.
  2. Running the job manually from the azure portal*
  3. Redeploying the site
  4. Setting WEBJOBS_STOPPED=1 in App Settings
  5. Changing the schedule time from 5 minutes to 6 minutes

*This gave me an error:

Could not run job: 'EnrolmentProcessor'.
Please try again. If the problem persists, contact support.

All of these have not affected the process and it is still stuck in a "Running" State.

I have looked at the folders from the KUDU debug console and have noticed the following:

The process where it has failed has these log lines:

[04/15/2016 22:20:02 > 1e4ce4: SYS INFO] Status changed to Initializing
[04/15/2016 22:20:02 > 1e4ce4: SYS INFO] Run script 'TERACC.WebJob.EnrolmentProcessor.exe' with script host - 'WindowsScriptHost'
[04/15/2016 22:20:02 > 1e4ce4: SYS INFO] Status changed to Running

The previous process that succeeded has these log lines:

[04/15/2016 22:15:01 > 1e4ce4: SYS INFO] Status changed to Initializing
[04/15/2016 22:15:01 > 1e4ce4: SYS INFO] Run script 'TERACC.WebJob.EnrolmentProcessor.exe' with script host - 'WindowsScriptHost'
[04/15/2016 22:15:01 > 1e4ce4: SYS INFO] Status changed to Running
[04/15/2016 22:18:51 > 1e4ce4: SYS INFO] Status changed to Success

There is a file named triggeredJob.lock which has the following StackTrace at the time of failure:

2016-04-15T22:20:02   at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
   at System.Environment.get_StackTrace()
   at Kudu.Core.Infrastructure.LockFile.WriteLockInfo(Stream lockStream)
   at Kudu.Core.Infrastructure.LockFile.Lock()
   at Kudu.Core.Jobs.TriggeredJobRunner.StartJobRun(TriggeredJob triggeredJob, JobSettings jobSettings, String trigger, Action`2 reportAction)
   at Kudu.Core.Jobs.TriggeredJobsManager.InvokeTriggeredJob(String jobName, String arguments, String trigger)
   at Kudu.Services.Jobs.JobsController.InvokeTriggeredJob(String jobName, String arguments)
   at lambda_method(Closure , Object , Object[] )
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.<>c__DisplayClass10.<GetExecutor>b__9(Object instance, Object[] methodParameters)
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.Execute(Object instance, Object[] arguments)
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ExecuteAsync(HttpControllerContext controllerContext, IDictionary`2 arguments, CancellationToken cancellationToken)
   at System.Web.Http.Controllers.ApiControllerActionInvoker.<InvokeActionAsyncCore>d__0.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.Controllers.ApiControllerActionInvoker.InvokeActionAsyncCore(HttpActionContext actionContext, CancellationToken cancellationToken)
   at System.Web.Http.Controllers.ApiControllerActionInvoker.InvokeActionAsync(HttpActionContext actionContext, CancellationToken cancellationToken)
   at System.Web.Http.Controllers.ActionFilterResult.<ExecuteAsync>b__0(ActionInvoker innerInvoker)
   at System.Web.Http.Controllers.ActionFilterResult.<>c__DisplayClass10`1.<InvokeActionWithActionFilters>b__f()
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__5.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.Filters.ActionFilterAttribute.CallOnActionExecutedAsync(HttpActionContext actionContext, CancellationToken cancellationToken, Func`1 continuation)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__0.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.Filters.ActionFilterAttribute.ExecuteActionFilterAsyncCore(HttpActionContext actionContext, CancellationToken cancellationToken, Func`1 continuation)
   at System.Web.Http.Filters.ActionFilterAttribute.System.Web.Http.Filters.IActionFilter.ExecuteActionFilterAsync(HttpActionContext actionContext, CancellationToken cancellationToken, Func`1 continuation)
   at System.Web.Http.Controllers.ActionFilterResult.<>c__DisplayClassb.<>c__DisplayClassd.<InvokeActionWithActionFilters>b__9()
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__5.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.Filters.ActionFilterAttribute.CallOnActionExecutedAsync(HttpActionContext actionContext, CancellationToken cancellationToken, Func`1 continuation)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__0.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.Filters.ActionFilterAttribute.ExecuteActionFilterAsyncCore(HttpActionContext actionContext, CancellationToken cancellationToken, Func`1 continuation)
   at System.Web.Http.Filters.ActionFilterAttribute.System.Web.Http.Filters.IActionFilter.ExecuteActionFilterAsync(HttpActionContext actionContext, CancellationToken cancellationToken, Func`1 continuation)
   at System.Web.Http.Controllers.ActionFilterResult.<>c__DisplayClassb.<>c__DisplayClassd.<InvokeActionWithActionFilters>b__9()
   at System.Web.Http.Controllers.ActionFilterResult.<ExecuteAsync>d__2.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.Controllers.ActionFilterResult.ExecuteAsync(CancellationToken cancellationToken)
   at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__0.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.Controllers.ExceptionFilterResult.ExecuteAsync(CancellationToken cancellationToken)
   at System.Web.Http.ApiController.ExecuteAsync(HttpControllerContext controllerContext, CancellationToken cancellationToken)
   at System.Web.Http.Dispatcher.HttpControllerDispatcher.<SendAsync>d__1.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.Dispatcher.HttpControllerDispatcher.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpMessageInvoker.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Web.Http.Dispatcher.HttpRoutingDispatcher.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.DelegatingHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Web.Http.HttpServer.<SendAsync>d__0.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.HttpServer.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpMessageInvoker.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Web.Http.WebHost.HttpControllerHandler.<ProcessRequestAsyncCore>d__0.MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[TStateMachine](TStateMachine& stateMachine)
   at System.Web.Http.WebHost.HttpControllerHandler.ProcessRequestAsyncCore(HttpContextBase contextBase)
   at System.Web.Http.WebHost.HttpControllerHandler.ProcessRequestAsync(HttpContext context)
   at System.Web.HttpTaskAsyncHandler.<>c__DisplayClass4_0.<System.Web.IHttpAsyncHandler.BeginProcessRequest>b__0()
   at System.Web.TaskAsyncHelper.BeginTask(Func`1 taskFunc, AsyncCallback callback, Object state)
   at System.Web.HttpTaskAsyncHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest(HttpContext context, AsyncCallback cb, Object extraData)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
   at System.Web.HttpApplication.PipelineStepManager.ResumeSteps(Exception error)
   at System.Web.HttpApplication.BeginProcessRequestNotification(HttpContext context, AsyncCallback cb)
   at System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest wr, HttpContext context)
   at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr rootedObjectsPointer, IntPtr nativeRequestContext, IntPtr moduleData, Int32 flags)
   at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr rootedObjectsPointer, IntPtr nativeRequestContext, IntPtr moduleData, Int32 flags)
   at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion(IntPtr pHandler, RequestNotificationStatus& notificationStatus)
   at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr rootedObjectsPointer, IntPtr nativeRequestContext, IntPtr moduleData, Int32 flags)
   at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr rootedObjectsPointer, IntPtr nativeRequestContext, IntPtr moduleData, Int32 flags)

This indicates to me that something has gone wrong when trying to process the Webjob and left it in this corrupt state.

I am not worried about losing data in the process that is "running", I just need to get the Webjob running properly again. What can I do to abort the one it is stuck on and get it to start up again?

1
Can you share your web app name, either directly or indirectly? This will help investigate. Thanks!David Ebbo
@DavidEbbo - sure thing. Here is a site on the same subscription: testforfurtherinfo.azurewebsites.net. The site with the erroring webjob ends in test (not testapi). The Webjob that is stuck is called EnrolmentProcessormadbrendon
seems there is a dead lock. Can you show us you code?Alex Chen-WX

1 Answers

4
votes

It seems that it got into a bad state. I think the key is to delete that triggeredJob.lock file. The problem is that it's locked and doesn't want to get deleted.

Please try these steps to see if that allows deletion:

  • Stop both the site and the SCM site using these special steps
  • Connect using FTP and delete the lock file
  • Restart the site (e.g. from portal)

If that doesn't work, we'll try something else. I'd like to understand the root cause, but let's try to get you running first.