6
votes

As the load on our Azure website has increased (along with the complexity of the work that it's doing), we've noticed that we're running into CPU utilization issues. CPU utilization gradually creeps upward over the course of several hours, even as traffic levels remain fairly steady. Over time, if the Azure stats are correct, we'll somehow manage to get > 60 seconds of CPU per instance (not quite clear how that works), and response times will start increasing dramatically.

If I restart the web server, CPU drops immediately, and then begins a slow creep back up. For instance, in the image below, you can see CPU creeping up, followed by the restart (with the red circle) and then a recovery of the CPU.

Azure website CPU graph

I'm strongly inclined to suspect that this is a problem somewhere in my own code, but I'm scratching my head as to how to figure it out. So far, any attempts to reproduce this on my dev or testing environments have proven ineffectual. Nearly all the suggestions for profiling IIS/C# performance seem to presume either direct access to the machine in question or at least a "Cloud Service" instance rather than an Azure Website.

I know this is a bit of a long shot, but... any suggestions, either for what it might be, or how to troubleshoot it?

(We're using C# 5.0, .NET 4.5.1, ASP.NET MVC 5.2.0, WebAPI 2.2, EF 6.1.1, Azure System Bus, Azure SQL Database, Azure redis cache, and async for every significant code path.)

Edit 8/5/14 - I've tried some of the suggestions below. But when the website is truly busy, i.e., ~100% CPU utilization, any attempt to download a mini-dump or GC dump results in a 500 error, with the message, "Not enough storage." The various times that I have been able to download a mini-dump or GC dump, they haven't shown anything particularly interesting, at least, so far as I could figure out. (For instance, the most interesting thing in the GC dump was a half dozen or so >100KB string instances - those seem to be associated with the bundling subsystem in some fashion, so I suspect that they're just cached ScriptBundle or StyleBundle instances.)

1
Can you remote debug? Pause the debugger 10 times. It will stop most where the code spends most time.usr
Any threading, timers, concurrency, ...? Any background work? Any static state (static variables, Application.Items, ...)?usr
@usr - I'll try some remote debugging late tonight, when the impact will be a bit less. As for statics, yeah, I've got a few - mostly for logging (using NLog). I thought it might be a static instance of the EF DbContext I found buried deep in some old code, but I fixed that to little effect. Currently reviewing any background tasks.Ken Smith
This could be rogue background work piling up or a shared data structure filling up.usr

1 Answers

3
votes
  1. Try remote debugging to your site from visual studio.
  2. Try https://{sitename}.scm.azurewebsites.net/ProcessExplorer/ there you can take memory dumps an GC dumps of your w3wp process. Then you can compare 2 GC dumps to find memory leaks and open the memory dump with windbg/VS for further "offline" debugging.