1
votes

I've been reading up a little on zero-pause garbage collectors for managed languages. From what I understand, one of the most difficult things to do without stop-the-world pauses is heap compaction. Only very few collectors (eg Azul C4, ZGC) seem to be doing, or at least approaching, this.

So, most GCs introduce dreaded stop-the-world pauses the compact the heap (bad!). Not doing this seems extremely difficult, and does come with a performance/throughput penalty. So either way, this step seems rather problematic.

And yet - as far as I know, most if not all GCs still do compact the heap occasionally. I've yet to see a modern GC that doesn't do this by default. Which leads me to believe: It has to be really, really important. If it wasn't, surely, the tradeoff wouldn't be worth it.

At the same time, I have never seen anyone do memory defragmentation in C++. I'm sure some people somewhere do, but - correct me if I am wrong - it does not at all seem to be a common concern. I could of course imagine static memory somewhat lessens this, but surely, most codebases would do a fair amount of dynamic allocations?!

So I'm curious, why is that?

Are my assumptions (very important in managed languages; rarely done in C++) even correct? If yes, is there any explanation I'm missing?

2
You cannot do heap compaction in C++ because that changes the addresses of objects that people have stored pointers to. - Botje
It is very important. Long-running C++ processes can and do suffer from memory fragmentation. Modern memory allocators (like tcmalloc or jemalloc) try to prevent these problems by using several allocation buckets for requests of the same size and managing a free list to make sure holes in these buckets are filled first before asking for extra memory from the OS. Managed languages have an extra incentive to do so because the fewer memory pages your GC has to scan in the mark phase, the faster it runs. - Botje
C++ uses the operating systems memory allocation system (abstracted through new/delete and malloc/free). And if the operating system uses virtual memory you really can't compact the "heap". The reason is that all you have is virtual addresses, and compacting virtual memory might not move data as you expect in the physical memory. - Some programmer dude
If you're interested in this kind of stuff, read the jemalloc paper or the mimalloc paper as an intro to modern advanced memory management. - Botje
Another technique used in C/C++ programs: memory arenas for short-lived, bursty allocations that can be cleaned up in one go. That also prevents memory fragmentation. - Botje

2 Answers

2
votes

Garbage collection can compact the heap because it knows where all of the pointers are. After all, it just finished tracing them. That means that it can move objects around and adjust the pointers (references) to the new location.

However, C++ cannot do that, because it doesn't know where all the pointers are. If the memory allocation library moved things around, there could be dangling pointers to the old locations.

Oh, and for long running processes, C++ can indeed suffer from memory fragmentation. This was more of a problem on 32-bit systems because it could fail to allocate memory from the OS, because it might have used up all of the available 1 MB memory blocks. In 64-bit it is almost impossible to create so many memory mappings that there is nowhere to put a new one. However, if you ended up with a 16 byte memory allocation in each 4K memory page, that's a lot of wasted space.

C and C++ applications solve that by using storage pools. For a web server, for example, it would start a pool with a new request. At the end of that web request, everything in the pool gets destroyed. The pool makes a nice, constant sized block of RAM that gets reused over and over without fragmentation.

Garbage collection tends to use recycling pools as well, because it avoids the strain of running a big GC trace and reclaim at the end of a connection.

One method some old operating systems like Apple OS 9 used before virtual memory was a thing is handles. Instead of a memory pointer, allocation returned a handle. That handle was a pointer to the real object in memory. When the operating system needed to compact memory or swap it to disk it would change the handle.

I have actually implemented a similar system in C++ using an array of handles into a shared memory map psuedo-database. When the map was compacted then the handle table was scanned for affected entries and updated.

1
votes

Generic memory compaction is not generally useful nor desirable because of its costs.

What may be desirable is to have no wasted/fragmented memory and that can be achieved by other methods than memory compaction.

In C++ one can come up with a different allocation approach for objects that do cause fragmentation in their specific application, e.g. double-pointers or double-indexes to allow for object relocation; object pools or arenas that prevent or minimize fragmentation. Such solutions for specific object types is superior to generic garbage collection because they employ application/business specific knowledge which allows to minimize the scope/cost of object storage maintenance and also happen at most appropriate times.

A research found that garbage collected languages require 5 times more memory to achieve performance of non-GC equivalent programs. Memory fragmentation is more severe in GC languages.