2
votes

Simple but yet complicated question:

What counter to use to get perf tools to measure wall clock time?

As a base line the first thing when profiling code I think I need to measure is just wall clock time to get an first idea where the code takes most of the time. I don’t care if it’s IO or bandwidth limited or something else I just want to know where it is slow.

Sounds simple requirement, but with all the many tricks modern CPUs do to work efficient (like frequency scaling etc.) and the hell lot of different (not so well documented) performance counters available in perf, it’s not easy to be sure measuring the right thing.

Currently I do:

perf record -g -e ref-cycles -F 999 -- <cmd>

I think this is unscaled CPU frequency and thus proportional to the amount of wall clock time that part of the code is running. But who the hell knows?

1
Yes, ref-cycles on a modern CPU ticks at a constant rate always, even when the core clock is halted. (The CPU feature is constant_tsc (and nonstop_tsc which is really the same feature bit: How to get the CPU cycle count in x86_64 from C++?).) Of course there's also the software event task-clock based on kernel-measured CPU time. IDK if that would work well or not. - Peter Cordes
Oh, but the ref-cycles perf event does stop when the core clock stops. It's separate from the actual TSC. (The real HW event on modern Intel is cpu_clk_unhalted.ref_tsc or cpu_clk_unhalted.ref_xclk_any). Even clock halts to change CPU frequency affect it: Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC. And that's for a workload that doesn't sleep. So ref-cycles is fine for finding CPU hotspots, but not for overall profiles where I/O waits matter. - Peter Cordes
Do you have any recommendation for measuring the general WCT? Is there any event available that just reads the TSC? Or is that approach the wrong idea in general? - Peter
Ok. I think I misunderstood your comment. Did you say cpu_clk_unhalted.ref_tsc is what I’m looking for or did you say it’s affected by halts? - Peter
My first comment was part brain-fart, 2nd comment is a correction. I guess I should have deleted / reposted a corrected version. - Peter Cordes

1 Answers

2
votes

You can use task-clock.

This is explicitly wall clock time while the process is running and as a bonus is portable because it doesn't rely on any PMU event.