2
votes

I'm writing a 32-bit .NET program with a 2 stage input process:

  1. It uses native C++ via C++/CLI to parse an indefinite number files into corresponding SQLite databases (all with the same schema). The allocations by C++ 'new' will typically consume up to 1GB of the virtual address space (out of 2GB available; I'm aware of the 3GB extension but that'll just delay the issue).

  2. It uses complex SQL queries (run from C#) to merge the databases into a single database. I set the cache_size to 1GB for the merged database so that the merging part has minimal page faults.

My problem is that the cache in stage 2 does not re-use the 1GB of memory allocated by 'new' and properly released by 'delete' in stage 1. I know there's no leak because immediately after leaving stage 1, 'private bytes' drops down to a low amount like I'd expect. 'Virtual size' however remains at about the peak of what the C++ used.

This non-sharing between the C++ and SQLite cache causes me to run out of virtual address space. How can I resolve this, preferably in a fairly standards-compliant way? I really would like to release the memory allocated by C++ back to the OS.

4
why don't you just run it on a 64-bit OS?jalf
C++/CLI is not "native C++". It's the .NET version of C++ and presumably uses the garbage collector like all other .NET based languages. Garbage collectors are convenient, but their downfall is that you lose control over when memory gets released.Mark Ransom
I'm not sure if it'll fix your problem, but may help: take a look at GC.AddMemoryPressure (msdn.microsoft.com/en-us/library/…). This allows you to instruct the CLR about native memory usage so it can better schedule GC activity.Nathan Ernst
@Mark: at least as I read it, he's using C++/CLI to "bridge" between the majority of the program (in C#) and the native C++.Jerry Coffin
What @Jalf said - Add more virtual memory! It's free!Bo Persson

4 Answers

4
votes

This is not something you can control effectively from the C++ level of abstraction (in other words you cannot know for sure if memory that your program released to the C++ runtime is going to be released to the OS or not). Using special allocation policies and non-standard extensions to try to handle the issue is probably not working anyway because you cannot control how the external libraries you use deal with memory (e.g. if the have cached data).

A possible solution would be moving the C++ part to an external process that terminates once the SQLite databases have been created. Having an external process will introduce some annoyiance (e.g. it's a bit harder to keep a "live" control on what happens), but also opens up more possibilities like parallel processing even if libraries are not supporting multithreading or using multiple machines over a network.

2
votes

Since you're interoperating with C++/CLI, you're presumably using Microsoft's compiler.

If that's the case, then you probably want to look up _heapmin. After you exit from your "stage 1", call it, and it'll release blocks of memory held by the C++ heap manager back to the OS, if the complete block that was allocated from the OS is now free.

0
votes

On Linux, we used google malloc (http://code.google.com/p/google-perftools/). It has a function to release the free memory to the OS: MallocExtension::instance()->ReleaseFreeMemory().

In theory, gcmalloc works on Windows, but I never personally used it there.

0
votes

You could allocate it off the GC from C#, pin it, use it, and then allow it to return, thus freeing it and letting the GC compact it and re-use the memory.