19
votes

It's a known fact that Windows applications usually have 2Gb of private address space on a 32bit system. This space can be extended to 3Gb with the /3Gb switch.

The operating system reserves itself the remaining of the 4Gb.

My question is WHY?

Code running in kernel mode (i.e. device driver code) has its own address space. Why, on top of a exclusive 4Gb address space, the operating system still want to reserve 2Gb of each user-mode process?

I thought the reason is the transition between user-mode and kernel-mode call. For example, a call to NtWriteFile will need an address for the kernel dispatch routine (hence why the system reserve 2Gb in each application). But, using SYSENTER, isn't the system service number enough for the kernel-mode code to know which function/service is being called?

If you could clarify to me why it's so important for the operating system to take 2Gb (or 1Gb) of each user-mode process.

5
I seriously can not believe people voted to close this as NPR.Dave Markle
I agree, Dave, it's borderline crazy to assert that operating-system design is "not programming related".Alex Martelli
Everywhere across your post you have used lowercase 'b' which represents bit. It should be capital 'B' which represents Byte. All operating systems follow Byte addressable scheme. So for a 32-bit PC, 2 raised to the power 32 is equal to 4 Gb (4 Gb unique addresses) but since computer actually addresses a byte instead of a bit, it becomes 4GB.RBT

5 Answers

22
votes

Two different user processes have different virtual address spaces. Because the virtual↔physical address mappings are different, the TLB cache is invalidated when switching contexts from one user process to another. This is very expensive, as without the address already cached in the TLB, any memory access will result in a fault and a walk of the PTEs.

Syscalls involve two context switches: user→kernel, and then kernel→user. In order to speed this up, it is common to reserve the top 1GB or 2GB of virtual address space for kernel use. Because the virtual address space does not change across these context switches, no TLB flushes are necessary. This is enabled by a user/supervisor bit in each PTE, which ensures that kernel memory is only accessible while in the kernelspace; userspace has no access even though the page table is the same.

If there were hardware support for two separate TLBs, with one exclusively for kernel use, then this optimization would no longer be useful. However, if you have enough space to dedicate, it's probably more worthwhile to just make one larger TLB.

Linux on x86 once supported a mode known as "4G/4G split". In this mode, userspace has full access to the entire 4GB virtual address space, and the kernel also has a full 4GB virtual address space. The cost, as mentioned above, is that every syscall requires a TLB flush, along with more complex routines to copy data between user and kernel memory. This has been measured to impose up to a 30% performance penalty.


Times have changed since this question was originally asked and answered: 64-bit operating systems are now much more prevalent. In current OSes on x86-64, virtual addresses from 0 to 247-1 (0-128TB) are allowed for user programs while the kernel permanently resides within virtual addresses from 247×(217-1) to 264-1 (or from -247 to -1, if you treat addresses as signed integers).

What happens if you run a 32-bit executable on 64-bit Windows? You would think that all virtual addresses from 0 to 232 (0-4GB) would easily be available, but in order to avoid exposing bugs in existing programs, 32-bit executables are still limited to 0-2GB unless they are recompiled with /LARGEADDRESSAWARE. For those that are, they get access to 0-4GB. (This is not a new flag; the same applied in 32-bit Windows kernels running with the /3GB switch, which changed the default 2G/2G user/kernel split to 3G/1G, although of course 3-4GB would still be out of range.)

What sorts of bugs might there be? As an example, suppose you are implementing quicksort and have two pointers, a and b pointing at the start and past the end of an array. If you choose the middle as the pivot with (a+b)/2, it'll work as long as both the addresses are below 2GB, but if they are both above, then the addition will encounter integer overflow and the result will be outside the array. (The correct expression is a+(b-a)/2.)

As an aside, 32-bit Linux, with its default 3G/1G user/kernel split, has historically run programs with their stack located in the 2-3GB range, so any such programming errors would likely have be flushed out quickly. 64-bit Linux gives 32-bit programs access to 0-4GB.

3
votes

Windows (like any OS) is a lot more than the kernel + drivers.

Your application relies on a lot of OS services that do not just exist in kernel space. There are a lot of buffers, handles and all sorts of resources that can get mapped to your process' own address space. Whenever you call a Win32 API function that returns, say, a window handle, or a brush, those things have to be allocated somewhere in your process. So part of Windows runs in the kernel, yes, other parts run in their own user-mode processes, and some, the ones your application needs direct access to, are mapped to your address space. Part of this is hard to avoid, but an important additional factor is performance. If every Win32 call required a context switch, it would be a major performance hit. If some of them can be handled in usermode because the data they rely on is already mapped to your address space, the context switch is avoided, and you save quite a few CPU cycles.

So any OS needs some amount of the address space set aside. I believe Linux by default sets only 1GB for the OS.

The reason why MS settled on 2GB with Windows was explained on Raymond Chen's blog once. I don't have the link, and I can't remember the details, but the decision was made because Windows NT was originally targeted at Alpha processors as well, and on Alpha's, there was some REALLY good reason to do the 50/50 split. ;)

It was something to do with the Alpha's support for 32 as well as 64-bit code. :)

2
votes

Code running in kernel mode (ie device driver code) has it's own address space.

No it does not. It has to share that address space with the user mode portion of a process on x86 processors. That's why the kernel have to reserve space enough in total and finite the address space.

1
votes

I believe the best answer is that the OS designers felt that by the time you would have to care, people would be using 64-bit Windows.

But here's a better discussion.

0
votes

Part of the answer is to do with the history of microprocessor architectures. Here's some of what I know, others can provide more recent details.

The Intel 8086 processor had a segment-offset architecture for memory, giving 20 bit memory addresses, and therefore total addressable physical memory of 1MB.

Unlike competing processors of the era - like the Zilog Z80 - the Intel 8086 had only one address space which had to accommodate not only electronic memory, but all input/output communication with such minor peripherals as keyboard, serial ports, printer ports and video displays. (For comparison, the Zilog Z80 had a separate input/output address space with dedicated assembly opcodes for access)

The need to allow space for an ever growing range of peripheral expansions led to the original decision to segment the address space into electronic memory from 0-640K, and "other stuff" (input/output, ROMS, video memory etc) from 640K to 1MB.

As the x86 line grew and evolved, and PCs evolved with them, similar schemes have been used, ending with todays 2G/2G split of the 4G address space.