As the VAS is separate for each process and can be of size upto 4GB,is the whole VAS of a process is loaded in the CPU context in case of context switch of the process ?
Also as each process has separate page table, does the page table is also bought in the CPU context in case of context switch ?
These questions are related. You swap out one virtual address space for another by changing which set of page tables is performing the virtual -> linear translation. That's how the address space swap is accomplished.
Let's consider a very simple example.
- Say we have two processes PA and PB. Both processes are executing their program image at virtual address 0x1000.
- Not visible to the processes are a set of page tables, which map the virtual address space to physical pages of RAM:
- Pagetable TA maps virtual address 0x1000 to physical address 0x88000
- Pagetable TB maps virtual 0x1000 to physical 0x99000.
- Let's say the theoretical CPU has a register called
PP
(pagetable pointer)
After the processes have been initialized, "swapping the virtual address space" between the two is simple. To load the address space for PA, you simply put the address of TA in PP
, and now that process "sees" the memory at 0x88000. And likewise the address of TB for PB, so he will "see" the memory at 0x99000.
When switching between threads (of the same process), the virtual address space does not need changed (because all threads of a given process share the same virtual address space).
Of course there are other things which need swapped in as well (like the CPU registers), but for this discussion, we're only concerned with virtual memory.
On x86 CPUs, the CR3
register is the pointer to the base of the page table hierarchy. It is this register which the OS changes to change address spaces when swapping processes.
Of course, it's more complicated than that. Because the possible virtual address space is so large (4 GiB on x86-32, and 16 exabytes on x86-64), the pagetables themselves would take up a ridiculous amount of space (one entry for every 4 KiB page). To alleviate this, additional levels of indirection are added to the pagetables, which is why I referred to them as a hierarchy. On x86-64, there are 4 levels.
Now imagine if the CPU had to "walk" these paging structures for every virtual-to-physical translation. A single read from virtual memory would require a total of 5 memory accesses! This would be terribly slow.
Enter the Translation lookaside buffer, or TLB. The TLB caches these translations, so a given virtual-to-physical translation only requires the pagetables to be walked once. After that, the TLB remembers the translation, and is much faster. (Of course the TLB can get full, but cache eviction is another story).
So say PA is running, and all of a sudden the kernel swaps in the address space for PB. Now all of those cached virtual-to-physical translations are no longer valid for the new virtual address space! That means we need to flush the TLB, or clear all of its entries out. And because of that, the CPU has to do the slow pagetable walking again, until the TLB cache "heats up" again.
This is why it's considered "expensive" to swap virtual address spaces. Not because it's hard to write to CR3
, but because we trash the TLB every time we do.
WriteProcessMemory
writes into the virtual address space of another process"? – Jonathon Reinhart