SYSCALL
and SYSRET
(and their 32-bit-only Intel counterparts SYSENTER
and SYSEXIT
) are usually described as a “generally faster” way to enter and exit supervisor mode in x86 processors than call gates or software interrupts, but the exact figures underlying this claim remain largely undocumented. In particular, all of the Intel or AMD optimization guides I was able to find contain no mention of these instructions at all. So:
- How many cycles (estimated) do
SYSCALL
andSYSRET
take across recent Intel 64 microarchitectures? This is probably measurable by direct experimentation, but there are quite a few of different CPUs to test.
Depending on the order of magnitude of this number, more detailed questions may be relevant:
- Do they incur a complete pipeline stall, or any other kind of stall?
- How, if at all, do they interact with branch prediction (e.g. the return stack buffer) and fetch logic?
- What about latencies, data dependencies, serialization?
- &tc.
Assume 64-bit code on the userspace side, no additional address-space switches (writes to CR3) and even matching SYSCALL
and SYSRET
pairs if it matters.
getpid()
look doubtful when Bachmann and Walfield report 250 or so for two system calls. Sadly, Agner Fog hasn’t measured theSYS*
instructions. – Alex ShpilkinSYSCALL
turned out to be much more security-relevant than I imagined in 2013...) – Alex Shpilkin