Simple but yet complicated question:
What counter to use to get perf tools to measure wall clock time?
As a base line the first thing when profiling code I think I need to measure is just wall clock time to get an first idea where the code takes most of the time. I don’t care if it’s IO or bandwidth limited or something else I just want to know where it is slow.
Sounds simple requirement, but with all the many tricks modern CPUs do to work efficient (like frequency scaling etc.) and the hell lot of different (not so well documented) performance counters available in perf, it’s not easy to be sure measuring the right thing.
Currently I do:
perf record -g -e ref-cycles -F 999 -- <cmd>
I think this is unscaled CPU frequency and thus proportional to the amount of wall clock time that part of the code is running. But who the hell knows?
constant_tsc
(andnonstop_tsc
which is really the same feature bit: How to get the CPU cycle count in x86_64 from C++?).) Of course there's also the software eventtask-clock
based on kernel-measured CPU time. IDK if that would work well or not. - Peter Cordesref-cycles
perf event does stop when the core clock stops. It's separate from the actual TSC. (The real HW event on modern Intel iscpu_clk_unhalted.ref_tsc
orcpu_clk_unhalted.ref_xclk_any
). Even clock halts to change CPU frequency affect it: Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC. And that's for a workload that doesn't sleep. Soref-cycles
is fine for finding CPU hotspots, but not for overall profiles where I/O waits matter. - Peter Cordes