0
votes

I'm studying how virtual memory works and I'm not sure what happens if I load a big file (smaller than the physical memory, though) with fread() and similar.

As far as I understand, the operating system might not allocate the entire corresponding physical memory. Instead, it could wait until a page fault is triggered as my program reads a specific portion of the file (a portion not yet mapped to physical memory).

This is basically the behavior of a memory mapped file. So, if my assumptions are correct, what is the benefit of using system calls like mmap()? Just to avoid the usual for-loop dance when reading with fread(), maybe?

1

1 Answers

0
votes

read(),fread() will read the amount you specified into the buffer you provide. Mmap is a separate interface into the kernel file cache. Where the two intersect is that the kernel will most likely first read the file into cache buffers, then copy select bits of those cache buffers into your user buffer.

This double copy is often necessary because your program doesn't provide the necessary alignment and blocking size the underlying device requires, and if the data requires transformation (decrypt, uncompress), it needs a place to do it from.

This kernel cache is kept coherent with the file, so system wide reads and writes go through it. If you mmap the file, you may be able to avoid the double copy; but have to deal with changes to the file appearing un-announced.