2
votes

We are intermittently seeing duplicate HTTP request hitting our Azure Cloud Service application. Our application uses Azure Traffic Manager distribute traffic across two regions (for DR purposes). We have 3 large web role instances per region.

Has anyone encountered (ideally, solved) a similar issue in Azure?

More specifics below:

1. Trace logging:

Trace logging is implemented in the MVC controller action which appears to be duplicated:

    [HttpPost]
    public JsonResult JsonPost(FormCollection formData)
    {
        Logger.Info(LogFormatter.FormatMessage(ToString(), "JsonPost", UserHelper.CustomerSession.CustomerId, "Begin JSON post."));
        var result = new DefaultJsonResult();
        try
        {
            #region Build and validate foo model
            var fooModel = BuildFooModel(formData);
            .
            .
            .
        }
        catch (Exception ex)
        {
            Logger.Error(LogFormatter.FormatException(ex, ToString(), "JsonPost",
                UserHelper.CustomerSession.CustomerId, "Error while proccessing Foo."));
            result.Success = false;
            result.Action = new JsonResultAction(JsonResultActionType.Display);
            result.Error = new JsonResultError();
            result.Error.Details.Add(new JsonResultErrorDetail
            {
                Type = "FooProcessing",
                Message = "Error while proccessing Foo."
            });
        }
        Logger.Info(LogFormatter.FormatMessage(ToString(), "JsonPost", UserHelper.CustomerSession.CustomerId, "End JSON post."));

        return Json(result);
    }                

In the trace logs we see multiple instances of the following:

Message: Calculating bar.
Timestamp: 01/22/2015 03:42:28
Controller Name: Com.Web.Controllers.FooController
Action Name: bar
User Id: 85c5d33f-05b3-40d5-9e73-1219ca490e7e
RoleInstance=StoreWebApp_IN_0; WindowsIdentity.Name=NT AUTHORITY\NETWORK SERVICE; ManagedThreadId=15;

Message: Begin JSON post.
Timestamp: 01/22/2015 03:43:22
Controller Name: Com.Web.Controllers.FooController
Action Name: JsonPost
User Id: 85c5d33f-05b3-40d5-9e73-1219ca490e7e
RoleInstance=StoreWebApp_IN_0; WindowsIdentity.Name=NT AUTHORITY\NETWORK SERVICE; ManagedThreadId=28;

Message: Sending Foo Notification
Timestamp: 01/22/2015 03:43:28
Controller Name: Com.Web.Business.Modules.Foo.FooModule
Action Name: ProcessFoo
User Id: 85c5d33f-05b3-40d5-9e73-1219ca490e7e
RoleInstance=StoreWebApp_IN_0; WindowsIdentity.Name=NT AUTHORITY\NETWORK SERVICE; ManagedThreadId=28;

Message: Begin JSON post.
Timestamp: 01/22/2015 03:43:30
Controller Name: Com.Web.Controllers.FooController
Action Name: JsonPost
User Id: 85c5d33f-05b3-40d5-9e73-1219ca490e7e
RoleInstance=StoreWebApp_IN_2; WindowsIdentity.Name=NT AUTHORITY\NETWORK SERVICE; ManagedThreadId=13;

A couple things of note:

  1. The second "Begin JSON post" happens before the first request logs its "End JSON post" (which is never logged BTW)

  2. The second "Begin JSON post" occurs on another web role instance

2. IIS Logs:

The corresponding IIS logs are as follows:

2015-01-22 03:42:29 W3SVC1273337584 RD00155DE0F696 127.0.0.13 POST /foo/bar
- 443 [email protected] 128.0.0.28 HTTP/1.1 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_8_2)+AppleWebKit/536.26.17+(KHTML,+like+Gecko)+Version/6.0.2+Safari/536.26.17

2015-01-22 03:43:30 W3SVC1273337584 RD00155DE0CAC8 127.0.0.54 POST /foo/jsonpost
- 443 [email protected] 128.0.0.28 HTTP/1.1 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_8_2)+AppleWebKit/536.26.17+(KHTML,+like+Gecko)+Version/6.0.2+Safari/536.26.17

At this point you're probably asking yourself where is the IIS log entry for the first /foo/jsonpost request? This behavior is consistent for every instance of the duplicated request, however, the reason is still a mystery.

Thanks for reading. Any insight or even suggestions on how to troubleshoot further would be appreciated.


Googling "azure duplicated requests" yielded the following results:

Update 1/26/2015
Enabled Failed Request Tracing in hopes to capture the phantom (first /foo/jsonpost) request and isolate where it's coming from. Unfortunately, still no luck, Failed Request Tracing is consistent with the IIS logs.

1
Perhaps this is due lengthy / large post requests being terminated by the Cloud Service Load Balancer, resulting in the client re-posting the data? I've seen this behaviour in other cloud load balancing scenarios.Simon W
Our posts aren't necessarily long running (around 10 seconds max). Regardless, I'd expect to see an IIS log entry for the original post if it originated from the clientside. I do think the issue is related to the load balancer, but I can't explain the missing IIS log entry.HOCA
My guess is that your first "Begin JSON post" is hanging somewhere in the code and timing out, and then the client is disconnecting and reconnecting to the other instance. You may see this failed request in the http.sys logs. I would start with additional logging within the JsonPost method to see how far into the code that particular request is getting.kwill
The 2nd log entry for "Begin JSON post" is about 10s after the first. You say that posts take about 10s. Looks like the client reposted. Can you exclude that possibility?usr
@kwill - We have extensive tracing in the JsonPost method. So far, the only pattern we see is the absence of the final trace message in the method. To your point, we'll add more logging to understand what's happening in the gap.HOCA

1 Answers

2
votes

Case Closed:

The missing log entry was caused by a stack overflow due to flawed third party API communication logic. When IIS restarted it resumed/recovered the request that caused the stack overflow. Because the connection was kept alive throughout the recycle the client only saw the response to the second execution. The reason the stack overflow didn't occur with the second attempt is that execution takes a different code path than the first execution.

How did we know there was a stack overflow (besides a TODO comment that read "prevent stack overflow"):

  • In system event logs: "A process serving application pool 'ASP.NET v4.0' suffered a fatal communication error with the Windows Process Activation Service. The process id was '5328'. The data field contains the error number."
  • In the HTTPERR (IIS) logs, "Connection_Abandoned_By_ReqQueue" error occurs, meaning a worker process from the application pool has quit unexpectedly or orphaned a pending request by closing its handle
  • Crash logs show 0x800703e9 - “Recursion too deep; the stack overflowed.

Hope these findings are helpful for other lost souls trying to solve the mystery of missing IIS logs...