17
votes

SYSCALL and SYSRET (and their 32-bit-only Intel counterparts SYSENTER and SYSEXIT) are usually described as a “generally faster” way to enter and exit supervisor mode in x86 processors than call gates or software interrupts, but the exact figures underlying this claim remain largely undocumented. In particular, all of the Intel or AMD optimization guides I was able to find contain no mention of these instructions at all. So:

  • How many cycles (estimated) do SYSCALL and SYSRET take across recent Intel 64 microarchitectures? This is probably measurable by direct experimentation, but there are quite a few of different CPUs to test.

Depending on the order of magnitude of this number, more detailed questions may be relevant:

  • Do they incur a complete pipeline stall, or any other kind of stall?
  • How, if at all, do they interact with branch prediction (e.g. the return stack buffer) and fetch logic?
  • What about latencies, data dependencies, serialization?
  • &tc.

Assume 64-bit code on the userspace side, no additional address-space switches (writes to CR3) and even matching SYSCALL and SYSRET pairs if it matters.

1
lkml.org/lkml/2002/12/9/13 - that's the orig posting with the benchmarks. These numbers would vary somewhat these days, I guess. Agner Fog's latency/throughput tables should give you an idea as well.FrankH.
@FrankH. I’d expect these figures to vary considerably: P4’s pipeline is much less friendly to context switches than that of e.g. Sandy Bridge. And the 600-something cycles for getpid() look doubtful when Bachmann and Walfield report 250 or so for two system calls. Sadly, Agner Fog hasn’t measured the SYS* instructions.Alex Shpilkin
I said I do expect them to vary - the reference above is almost 11 years old. The difference between somewhat and considerable I'd leave to the eye of the beholder :) In that sense, I've merely given the link because it describes the benchmark done back then - which means you could repeat it, right now, on current CPUs, if you like / if you have them available. Not aware of anyone having done that lately, though.FrankH.
There is a paper from 2010 about real syscall costs: cs.cmu.edu/~chensm/Big_Data_reading_group/papers/… "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". They show that syscalls has negative impact on IPC.osgx
(The point about branch prediction across SYSCALL turned out to be much more security-relevant than I imagined in 2013...)Alex Shpilkin

1 Answers

2
votes

I was curious too so I've written some basic bare-metal code to benchmark it: just a loop that calls syscall 1000000 times in a loop, with the syscall handler just running sysret and nothing else. On my Ryzen 7 3700X it averages 78 cycles for the call+return.

Obviously that's an artificial benchmark, because a real system call handler will likely need to do some things like switch stacks and perform Spectre mitigations. But it gives an idea of the order-of-magnitude, which is less than a cache miss.