45
votes

My client has an ASP.NET application installed on two production servers (balanced with NLB, but that's irrelevant). Both servers crash every 3-4 hours with the following event viewer logged error:

Faulting application name: w3wp.exe, version: 7.5.7601.17514, time stamp: 0x4ce7afa2
Faulting module name: clr.dll, version: 4.0.30319.18034, time stamp: 0x50b5a783
Exception code: 0xc00000fd Fault offset: 0x000000000001a840
Faulting process id: 0xd50
Faulting application start time: 0x01ce97fe076d27b4
Faulting application path: c:\windows\system32\inetsrv\w3wp.exe
Faulting module path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll Report Id: e0c90a5f-0455-11e3-8f0e-005056891553

I have no idea how to debug or where to start. When the crash is about to happen the server processor usage jumps to 100% and stays there. The process at fault is w3wp.exe. I'm not even sure if my code is generating the error or not. It's IIS 7.5. Any pointers would be greatly appreciated.

4

4 Answers

78
votes

It looks like you have a StackOverflow Exception, which is caused by unbounded recursion (a function repeatedly calling itself, etc). This can't be caught by regular try/catch block. You can track the problem down using DebugDiag and WinDbg.

DebugDiag can be configured to generate a crash dump when the StackOverflowException occurs. Download at https://www.microsoft.com/en-us/download/details.aspx?id=58210.

  1. Open DebugDiag and click Add Rule.
  2. "Crash" should already be selected. Click Next.
  3. Choose "A specific IIS web application pool" and click Next.
  4. Select the application pool and click Next.
  5. You should be on the Advanced Configuration Window. Click Exceptions under Advanced Settings.
  6. Click Add Exception and choose Stack Overflow, with an Action Type of Full Userdump
  7. Click OK and save and close out.

Next time a StackOverflowException occurs, you'll have a crash dump. Now to need to interpret the dump file.

Debugging tools for Windows is part of the Windows SDK and can be downloaded at http://msdn.microsoft.com/en-US/windows/hardware/gg463009/.

  1. To use WinDbg, you'll need to get the symbols files. Download the symbol files and put them in a local folder.
  2. Open up WinDbg. On the File menu, click Symbol File Path.
  3. In the Symbol path box, the documentation says to type the following command: SRV*your local folder for symbols*http://msdl.microsoft.com/download/symbols, however I just put in the local folder for the symbols and it worked fine.
  4. Exit out and open WinDbg again, and Open Crash Dump and locate the dump file that was created by DebugDiag.
  5. In the command line, type .loadby sos clr
  6. Now type !CLRStack

In the results, it should be clear what the problem is (you'll likely see a BUNCH of lines showing the function(s) that was repeatedly being called).

2
votes

Some addition to above answer. Develop Explorer extension which got error at user login. So for user it looks "flashing screen" (while explorer tries to start and crash, then restart etc). Logged in under another user account installed DebugDiag and WinDbg. I'm using Windows 8.1 with .Net 4.0 with all latest updates on today (Jan 13, 2014) Tried download few symbols locally, but WinDbg can'not load clr.pdb because of incorrect signaure.

Solved it using symbols online - use "SRV*http://msdl.microsoft.com/download/symbols" as symbols path.

0
votes

Another cause might "infinite recursively function". When occures infitine loop Windows try to avoidence deadlock and disable releated application pool.

I met same issue today. I have a recursive function which list parentproject-sub project. One project is setted itself parent project and when recusive function try list all parent-sub project, infinite loop occures.

0
votes

I was able to check Event Viewer -> Windows Logs -> System and find

Application pool 'DankAppPool' is being automatically disabled due to a series of failures in the process(es) serving that application pool.

Below that:

A process serving application pool 'DankAppPool' suffered a fatal communication error with the Windows Process Activation Service. The process id was '5704'. The data field contains the error number.

And:

The QueueMonitor service terminated unexpectedly. It has done this 32 time(s). The following corrective action will be taken in 60000 milliseconds: Restart the service.

At least the QueueMonitor service is a place to start.