You confuse the page with a page table entry:
- The page table consists of page table entries
- Each page table entry holds an address to physical memory of a page
- The page is a 16kB slice of memory
You want to map 4GB of physical memory onto a number of 16kB pages.
So you need (4*10^6 / 16*10^3) = 250 pages (adresses)
Each address is held in a page table entry which is a 32-bit integer that consists of the address in physical memory of a page and some modifier/info bits about that page.
Each process needs atleast one page table. Here it consists of 250 page table entries. Each page table entry size is an 32-bit (4B) number (call it whatever you want)
So the total size needed for one process is:
250 * 4B = 1000 B = 1kB
Why this approach isn't used based on the paging model used by the x86 MMU:
The problem with a one single level paging mechanism as you describe is that every (no matter how small) process will need:
4GB/16kb = 4 * 10^6 B / 16 * 10^3 B = 0,25 * 1000 = 250 page table entries
250 * 4 byte (32 bit per entry) = 1 kB per process
It doesn't seem like much but here you have used 16 kB pages (for example most x86 systems use 4kb pages so you would need 1MB for each process)
This is why the x86 uses a two-level paging process in which each process has 1024 page directory entries and each of those entries hold the address of the page table (which holds 1024 page table entries). So the minimal allocated memory of a process becomes:
4 bytes (page directory entry) + 1024 * 4 bytes (1024 32-bit page table entries)
Each page table entry points to a 4kB physical page in memory.
1024 page directory entries * 1024 page table entries for each directory * 4kB page = 4GB of addressable memory
The virtual address consists is roughly:
- Page directory entry index
- Page table entry index
- Offset in the 4kB page
This means that even if you increase the size of a single page (thus reducing the number of page table entries needed) then you need a bigger number to describe the offset.