Can a page fault handler generate more page faults?

Question

I'm a bit confused about what happens when we're in user-mode and a page fault occurs.

IIRC, a page fault will be generated when the TLB is attempting to map my (user-space) virtual address into a physical address and it fails.

It then generates an exception that will be synchronously handled by the OS. But the question now is: most likely, the addresses of this exception handler code plus its associated data are also not going to be in the TLB!

Does this get recursive or is this kernel range of memory addresses subject to different rules (for instance, an automatic mapping between virtual/physical memory as to avoid needing to use the TLB?)

Thanks!

I think you are misunderstanding the relationship between the TLB and page faults. Everywhere you say "TLB" you should say "page tables [for the process]" instead. The TLB is just a cache for the page tables - missing in the TLB doesn't cause a page fault (it causes a "page walk"), but missing in the page tables does. — BeeOnRope

Peter Cordes Peter Cordes · Accepted Answer · 2019-05-03T04:42:24

No, Linux doesn't swap out kernel memory. (For this and similar reasons, like being sure that a page-fault handler doesn't run before any random instruction that accesses memory).

OSes that do page some of kernel memory would definitely need to keep the page-fault handler, page-tables, and disk I/O code in memory...

this exception handler code plus its associated data are also not going to be in the TLB!

You're conflating page-walks (on a TLB miss) with page faults (the entry for the virtual page is invalid or insufficient permissions, taken after a page walk if necessary).

On x86 and most(?) other ISAs, page-walks are done by hardware. See What happens after a L2 TLB miss?.

The OS gives the CPU the physical address of the top-level page table (with mov cr3, rax for example on x86), and the CPU handles everything else transparently. (The only software TLB management is invalidation of a possibly-cached entry after modifying the page table entry in memory. e.g. x86 invlpg)

Hardware page-table management allows the CPU to speculatively do a TLB walk when a loop over an array is getting close to a page boundary, instead of waiting until an actual load touches the next page. And for page-walk latency to be partially hidden by out-of-order execution, and lots of good things. Skylake even has 2 page-walk units, so it can be working on 2 TLB misses in parallel (either or both could be speculative or demand).

On an ISA with software page walks, the TLB-miss handler is separate from the page-fault handler.

On MIPS for example, there is a special range of addresses which are mapped differently from normal kernel virtual addresses:

If address starts with 0b100 [top 3 bits], translates to bottom 512 Mbytes of physical memory and does not go through TLB. (cached and unmapped). Called kseg0. Used for kernel instructions and data.

MIPS TLB handling - https://people.csail.mit.edu/rinard/teaching/osnotes/h11.html

(MIPS addresses with the high bit set can only be used by kernel code, user-space access faults. i.e. a high-half kernel is baked-in for MIPS.)

This is kind of like have a 512MiB hugepage mapping of the low physical memory baked into the hardware. Obviously the kernel would want to keep its page-lookup data structure in that range, but it could use any data structure it wanted, e.g. based on start/length.

Can a page fault handler generate more page faults?

2 Answers