I want to see which pages are being accessed by my program.
You can simulate a CPU and get this data. Variants:
- 1) valgrind - dynamic translator of user-space binaries with good support of instrumentation. Try cachegrind tool - it will emulate even L1/L2 caches; also you can try to build new tool to log all memory accesses (e.g. with page granularity)
- 2) qemu - dynamic translator, both system-wide and process-wide modes. No instrumentation in the original qemu as I know
- 3) bochs - system-wide CPU emulator (very slow). You can easily hack "memory access" code to get memory log.
- 4) PTLsim - www.ptlsim.org/papers/PTLsim-ISPASS-2007.pdf
However, this involves the overhead of setting protection bits for all the memory pages
Is this overhead too big?
Now the question is how to handle TLB misses in user space for a linux program.
You cant handle a miss nor in user-space neither in kernel-space (on x86 and many other popular platforms). This is because most platforms manages TLB misses in hardware:. MMU (part of CPU/chipset) will do a walk on page tables and will get physical address transparently.
Only if some bits are set or when the address region is not mapped, page fault interrupt is generated and delivered to kernel.
Also, seems there is no way to dump TLB in modern CPUs (but 386DX was able to to this)
You can try to detect TLB miss by the delay introduced. But this delay can be hided by Out-of-order start of TLB lookup.
Also, most hardware events (memory access, tlb access, tlb hits, tlb misses) are counted by hardware performance monitoring (this part of CPU is used by Vtune, CodeAnalyst and oprofile). Unfortunately, this is only a global counters for events and you can't activate more than 2-4 events at same time. The good news is that you can set the perfmon counter to interrupt when some count is reached. Then you will get (via interrupt) address of instruction ($eip), where the count was reached. So, you can find TLB-miss-heavy hot-spot with this hardware (it is in every modern x86 cpu; both intel and amd).