29
votes

I always assumed a heap (data structure) is used to implement a heap (dynamic memory allocation), but I've been told I'm wrong.

How are heaps (for example, the one implemented by typical malloc routines, or by Windows's HeapCreate, etc.) implemented, typically? What data structures do they use?

What I'm not asking:

While searching online, I've seen tons of descriptions of how to implement heaps with severe restrictions.
To name a few, I've seen lots of descriptions of how to implement:

  • Heaps that never release memory back to the OS (!)
  • Heaps that only give reasonable performance on small, similarly-sized blocks
  • Heaps that only give reasonable performance for large, contiguous blocks
  • etc.

And it's funny, they all avoid the harder question:
How are "normal", general-purpose heaps (like the one behind malloc, HeapCreate) implemented?

What data structures (and perhaps algorithms) do they use?

2
Yes the two kind of heap are different. To learn about dynamic memory allocation google for dlmalloc (used by glibc) or tcmalloc (used by google). - brian beuning
@brianbeuning: Will take a look at dlmalloc, thanks. But TCMalloc currently does not return any memory to the system. - user541686
@Mehrdad: Yes. Most (all?) Unix based mallocs do not ever return memory to the system. - Billy ONeal
I don't think the C++ and C tags are appropriate here, but I'm having trouble thinking of better ones. - Cory Nelson
Recent versions of dlmalloc have a cool feature called mspaces. You use malloc() and free() on an mspace, or you can delete the mspace and free all memory allocated in the mspace. We are using this in our application server so each web session gets its own mspace. The mspace greatly improves page and cache locality, and if our code has any memory leak bug deleting the mspace fixes the leak. Our sessions uses one thread so mspaces can help the multi-threading issues recent allocators address. - brian beuning

2 Answers

14
votes

Allocators tend to be quite complex and often differ significantly in how they're implemented.

You can't really describe them in terms of one common data structure or algorithm, but there are some common themes:

  1. Memory is taken from the system in large chunks -- often megabytes at a time.
  2. These chunks are then split up into various smaller chunks as you perform allocations. Not exactly the same size as you allocate, but usually in certain ranges (200-250 bytes, 251-500 bytes, etc.). Sometimes this is multi-tiered, where you'd have an additional layer of "medium chunks" which come before your actual requests.
  3. Controlling which "large chunk" to break a piece off of is a very difficult and important thing to do -- this greatly affects memory fragmentation.
  4. One or more free pools (aka "free list", "memory pool", "lookaside list") are maintained for each of these ranges. Sometimes even thread-local pools. This can greatly speed up a pattern of allocating/deallocating many objects of similar size.
  5. Large allocations are treated a bit differently so as to not waste a lot of RAM and not be pooled quite so much if at all.

If you wanted to check out some source code, jemalloc is a modern high-performance allocator and should be representative in complexity of other common ones. TCMalloc is another common general-purpose allocator, and their website goes into all the gory implementation details. Intel's Thread Building Blocks has an allocator built specifically for high concurrency.

One interesting difference can be seen between Windows and *nix. In *nix, the allocator has very low-level control over the address space an app uses. In Windows, you basically have a course-grained, slow allocator VirtualAlloc to base your own allocator off of.

This results in *nix-compatible allocators typically directly giving you an malloc/free implementation where it's assumed you'll only use one allocator for everything (otherwise they'd trample each-other), while Windows-specific allocators provide additional functions, leaving malloc/free alone, and can be used in harmony (for instance, you can use HeapCreate to make private heaps which can work alongside others).

In practice, this trade in flexibility gives *nix allocators a small leg up performance-wise. It's very rare to see an app intentionally use multiple heaps on Windows -- mostly it's by accident due to different DLLs using different runtimes which each have their own malloc/free, and can cause a lot of headaches if you're not diligent in tracking which heap some memory came from.

6
votes

Note: The following answer assumes you're using a typical, modern system with virtual memory. The C and C++ standards do not require virtual memory; therefore of course you can't rely on such assumptions on hardware without this feature (e.g. GPUs typically don't have this feature; nor do extremely small hardware like the PIC).


This depends on the platform you're using. Heaps can be very complicated beasts; they don't use only a single data structure; and there is no "standard" data structure. Even where the heap code is located is different depending on the platform. For instance, the heap code is typically provided by the C Runtime on Unix boxes; but is typically provided by the operating system on Windows.

  1. Yes, this is common on Unix machines; due to the way *nix's underlying APIs and memory model operate. Basically, the standard API to return memory to the operating system on these systems only allows returning memory on the "frontier" between where user memory is allocated and the "hole" in between user memory and system facilities like the stack. (The API in question is brk or sbrk). Instead of returning memory to the operating system, many heaps only try to reuse memory no longer in use by the program proper, and don't try to return memory to the system. This is less common on Windows, because its equivalent to sbrk (VirtualAlloc) doesn't have this limitation. (But like sbrk, it is very expensive and has caveats like only allocating page-sized and page-aligned chunks. So heaps try to call either as rarely as possible)
  2. This sounds like a "block allocator", which divides the memory into fixed size chunks, and then just return one of the free chunks. To my (albeit limited) understanding, Windows' RtlHeap maintains a number of data structures like this for different known block sizes. (E.g. it'll have one for blocks of size 16, for instance) RtlHeap calls these "lookaside lists".
  3. I don't really know of a specific structure that handles this case well. Large blocks are problematic for most allocation systems because they cause fragmentation of the address space.

The best reference I've found discussing the common allocation strategies employed on major platforms is the book Secure Coding in C and C++, by Robert Seacord. All of chapter 4 is dedicated to heap data structures (and problems caused when users use said heap systems incorrectly).