5
votes

When the OS loads a process into memory it initializes the stack pointer to the virtual address it has decided where the stack should go in the process's virtual address space and program code uses this register to know where stack variables are. My question is how does malloc() know at what virtual address the heap starts at? Does the heap always exist at the end of the data segment, if so how does malloc() know where that is? Or is it even one contiguous area of memory or just randomly interspersed with other global variables in the data section?

3
It can be hard-coded.user3920237
I imagine that would be platform specific. Are you interested in getting an answer for a specific platform?R Sahu
@RSahu Let's say Linux.mclaassen
Are you satisfied knowing that the OS would know this information, and malloc() simply asks?jxh

3 Answers

11
votes

malloc implementations are dependent on the operating system; so is the process that they use to get the beginning of the heap. On UNIX, this can be accomplished by calling sbrk(0) at initialization time. On other operating systems the process is different.

Note that you can implement malloc without knowing the location of the heap. You can initialize the free list to NULL, and call sbrk or a similar function with the allocation size each time a free element of the appropriate size is not found.

2
votes

This only about Linux implementations of malloc

Many malloc implementations on Linux or Posix use the mmap(2) syscall to get some quite big range of memory. then they can use munmap(2) to release it.

(It looks like sbrk(2) might not be used a lot any more; in particular, it is not ASLR friendly and might not be multi-thread friendly)

Both these syscalls may be quite expansive, so some implementations ask memory (using mmap) in quite large chunks (e.g. in chunk of one or a few megabytes). Then they manage free space as e.g. linked lists of blocks, etc. They will handle differently small mallocs and large mallocs.

The mmap syscall usually does not start giving memory range at some fixed pieces (notably because of ASLR.

Try on your system to run a simple program printing the result of a single malloc (of e.g. 128 int-s). You probably will observe different addresses from one run to the next (because of ASLR). And strace(1)-ing it is very instructive. Try also cat /proc/self/maps (or print the lines of /proc/self/maps inside your program). See proc(5)

So there is no need to "start" the heap at some address, and on many systems that does not make even any sense. The kernel is giving segments of virtual addresses at random pages.

BTW, both GNU libc and musl libc are free software. You should look inside the source code of their malloc implementation. I find that source code of musl libc is very readable.

1
votes

On Windows, you use the Heap functions to get the process heap memory. The C runtime will allocate memory blocks on the heap using HeapAlloc and then use that to fulfil malloc requests.