GFlags setting to catch heap corruption (other than Page Heap)?

Question

On one production site our application^(*) crashes repeatedly, but non-reproducibly. Analyzing the crash dumps clearly shows that it's a heap corruption: The crashes are at different location, but always access violations inside kernel32!HeapFree/ntdll!RtlpLowFragHeapFree. Win Dbg !analyze -v also reports a heap corruption.

What we have tried so far is to run the application with the GFlags option Page Heap. The problem is that the memory overhead of Page Heap is such that the application won't operate anymore (hitting virtual memory limit for the 32 bit process).

So, we cannot use Page Heap. Which other flags would be useful to add so that we either

get a crash at the corruption site
or at least can get more info out of a crash dump that will eventually be generated when we crash in HeapFree?

We are currently trying out the flags:

in the hopes that the next crash dump will contain some more information of what went wrong.

I considered these flags, but left them out for now:

Enable heap parameter checking ... I would expect quite some overhead when the system checks every time a heap function is called
Enable heap free checking ... not sure whether this would actually buy me anything
Enable heap validation on call ... here even the docs warn of the high overhead

One problem I (also) have is that I'm unsure how these flags help when a memory corruption occurs. Page Heap obviously will generate an access violation when something writes into the guard pages, but how do the other flags operate?

Do I have to run the app with Application Verifier for these other flags to help? Or will an exception be raised when the checking code detects something?

Which combination of these flags makes most sense so that the application can still run with OK performance and memory consumption in production?

_{(*) : It's a 32bit Windows desktop application in industrial automation. Running on Win7 64bit in this case (which it does just fine at a whole lot of other sites).}

Actually, I believe that the Page Heap option would be your best bet. If you haven't already done so, you could try to make your process large adress aware. Hopefully that gives you enough memory to actually use the flag. — Lieven Keersmaekers

xMRi xMRi · Accepted Answer · 2013-09-26T13:28:48

IMHO the easiest way to control all this checking is using the ApplicationVerifier. You have a perfect UI and you can play around with all flags.
Heap Free checking is a good flag without too much overhead. So if a heap block is badly modified and the block is freed you get a break into the debugger. If the corruption occurs near the allocation and freeing of the block, this might help.
AFAIK "Heap parameter chechking" is just a lightweight "heap validation on call". I never had any success with this.
Heap tail checking and tagging is easy and fast. Works sometimes for me.

You know that you can control this on a per application base also with gflags.

gflags.exe /i Testapp.exe e0

But: The best way to find such problems is completely using the Debug-CRT... if it is possible for you. So if there is a chance to use you Debug-Version in the production environment, do it. Inside the Debug-CRT you again a lot of flags you can use and set....

GFlags setting to catch heap corruption (other than Page Heap)?

2 Answers