24
votes

We have received a native (full) crash dump file from a customer. Opening it in the Visual Studio (2005) debugger shows that we had a crash caused by a realloc call that tried to allocate a ~10MB block. The dump file was unusually large (1,5 GB -- normally they are more like 500 MB).

We therefore conclude that we have a memory "leak" or runaway allocations that either fully exhausted the memory of the process or at least fragmented it significantly enough for the realloc to fail. (Note that this realloc was for an operation that allocated a logging buffer and we are not surprised it failed here, because 10MB in one go would be one of the larger allocations that we do apart from some very large pretty unchangeable buffers -- the problem itself likely has nothing to do with this specific allocation.)

Edit: After the comments exchange wit Lex Li below, I should add: This is not reproducible for us (at the moment). It's just one customer dump clearly showing runaway memory consumption.

Main Question:

Now we have a dump file, but how can we locate what caused the excessive memory usage?

What we've done so far:

We have used the DebugDiag tool to analyze the dump file (the so called Memory Pressure Analyzer), and here's what we got:

Report for DumpFM...dmp

Virtual Memory Summary
----------------------
Size of largest free VM block   62,23 MBytes 
Free memory fragmentation       81,30% 
Free Memory                     332,87 MBytes   (16,25% of Total Memory) 
Reserved Memory                 0 Bytes   (0,00% of Total Memory) 
Committed Memory                1,67 GBytes   (83,75% of Total Memory) 
Total Memory                    2,00 GBytes 
Largest free block at           0x00000000`04bc4000 

Loaded Module Summary
---------------------
Number of Modules       114 Modules 
Total reserved memory   0 Bytes 
Total committed memory  3,33 MBytes 

Thread Summary
--------------
Number of Threads       56 Thread(s) 
Total reserved memory   0 Bytes 
Total committed memory  652,00 KBytes 

This was just to get a bit context. Whats more interesting I believe is:

Heap Summary
------------
Number of heaps         26 Heaps 
Total reserved memory   1,64 GBytes 
Total committed memory  1,61 GBytes 

Top 10 heaps by reserved memory
-------------------------------
0x01040000           1,55 GBytes        
0x00150000           64,06 MBytes        
0x010d0000           15,31 MBytes        
...

Top 10 heaps by committed memory
--------------------------------                              
0x01040000       1,54 GBytes 
0x00150000       55,17 MBytes 
0x010d0000       6,25 MBytes  
...            

Now, looking at heap 0x01040000 (1,5 GB) we see:

Heap 5 - 0x01040000 
-------------------
Heap Name          msvcr80!_crtheap 
Heap Description   This heap is used by msvcr80 
Reserved memory      1,55 GBytes 
Committed memory     1,54 GBytes (99,46% of reserved)  
Uncommitted memory   8,61 MBytes (0,54% of reserved)  
Number of heap segments             39 segments 
Number of uncommitted ranges        41 range(s) 
Size of largest uncommitted range   8,33 MBytes 
Calculated heap fragmentation       3,27% 

Segment Information
-------------------
Base Address | Reserved Size   | Committed Size  | Uncommitted Size | Number of uncommitted ranges | Largest uncommitted block | Calculated heap fragmentation 
0x01040640        64,00 KBytes      64,00 KBytes   0 Bytes            0                              0 Bytes                     0,00% 
0x01350000     1.024,00 KBytes   1.024,00 KBytes   0 Bytes            0                              0 Bytes                     0,00% 
0x02850000     2,00 MBytes       2,00 MBytes       0 Bytes            0                              0 Bytes                     0,00% 
...

What is this Segment Information anyway?

Looking at the allocations that are listed:

Top 5 allocations by size
-------------------------
Allocation Size - 336          1,18 GBytes     
Allocation Size - 1120004      121,77 MBytes    
...

Top 5 allocations by count
--------------------------
Allocation Size - 336    3760923 allocation(s) 
Allocation Size - 32     1223794 allocation(s)  
...

We can see that apparently the MSVCR80 heap holds 3.760.923 allocations at 336 bytes. This makes it pretty clear that we mopped up our memory with lots of small allocations, but how can we get some more info regarding where these allocation came from?

If we somehow could sample some of these allocation addresses and then check where in the process image these addresses are in use, then -- assuming that a large portion of these allocations are responsible for our "leak" -- we could maybe find out where these runaway allocations came from.

Unfortunately, I have really no idea how to get more info out of the dump at the moment.

How could I inspect this heap to see some of the "336" allocation addresses?

How can I search the dump for these addresses and how do I then find out which pointer variable (if any) in the dump hold on tho these addresses?

Any tips regarding usage of DebugDiag, WinDbg or any other tool could really help! Also, if you disagree with any of my analysis above, let us know! Thanks!

3
great question, thanks for the information and walkthrough, had a similar problem. BTW, DebugDiag is now at microsoft.com/en-us/download/details.aspx?id=40336x29a
I just noticed that the above mention version 2.0 doesnt support the analysis so one should get the 1.2 version: microsoft.com/en-us/download/details.aspx?id=26798 - if the installation fails, create a usergroup called "Users"x29a
Updating once again, the 2.x version DO support analysis, they just split DebugDiag into multiple applications, namly DebugDiag.Analysis.exe. Furthermore, version 2.1 is now available at microsoft.com/en-us/download/details.aspx?id=42933x29a

3 Answers

11
votes

You could:

  • look into these blocks of 336 bytes to see if the content tells you anything about what allocated them. To do that, I usually use windbg. First run the command !heap -stat -h 0x01040000 that will give you the size of the block, then pass this size to !heap -flt s size that will list all blocks of that size. You can then look into the block with any command that displays memory (like dc).
  • you cannot reproduce the problem, but you can look into another dump what allocates blocks of that size. First activate the stack backtrace feature using the gflags.exeutility (gflags -i your.exe +ust). Then run your application, get a dump, and use the !heap -flt s to list the blocks. Then the command !heap -p -a blockaddress will dump the stack of functions that allocated the block.
4
votes

In windbg, you can try using !heap -l which should crawl the heaps (takes a while, there may be a way to restrict the search to a specific heap to speed it up) and find all the busy blocks that are not referenced anywhere. From there open the memory window (alt+5) and take a look at some of the entries that match your allocation size that you suspect to be your leak. With some luck there could be some common patterns that can help you identify what the data is or better yet some ascii strings that you can place right away.

Unfortunately, I don't really know any other good ways except trying to reproduce it while turning on user mode stack traces with gflags and using umdh to take memory snapshots.

3
votes

How many dumps do you have now?

The proper way to track memory leak is to make good use of DebugDiag's Memory and Handle Leak rule.

Then when DebugDiag works on the new dumps, it can tell more about the memory usage.