I'm trying to read PMC (Performance Monitoring Counter) by using RDMSR and WRMSR instructions.
In my Linux desktop which has Intel i7 6700 CPU (Skylake), I wrote a simple driver code:
static int my_init(void)
{
unsigned int msr;
u64 low, high;
msr = 0x187;
low = 0x412e;
high = 0x0;
asm volatile("1: wrmsr\n"
"2:\n"
: : "c" (msr), "a"(low), "d" (high) : "memory");
msr = 0xC2;
asm volatile("1: rdmsr\n"
"2:\n"
: "=a" (low), "=d" (high) : "c" (msr));
printk("val: %lu\n", (low) | ((high) << 32));
return 0;
}
Referring to the Intel manual (18.2 ARCHITECTURAL PERFORMANCE MONITORING in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide), in above code, I write "0x412e" (# of L3 cache-misses) to "0x187" (IA32_PERFEVTSEL1 MSR) and read "0xC2" (IA32_PMC1 MSR).
However, according to the manual, the number of cache-misses must be returned in EAX: EDX (EAX contains low bit), and in practice, 0 is returned as value of the low (EAX) and high (ECX) values.
I want to know how to monitor performance event of Intel CPU by using MSR pair (IA32_PERFEVTSELx and IA32_PMCx). More specifically, the number of cache-misses is my goal.
If you have any idea about this, I would appreciate your advice. Thanks.
perf
subsystem, there are a couple direct-access implementations already that let you program the perf counters and then read them directly withrdpmc
in user-space. e.g. github.com/obilaniu/libpfc (by SO user @Iwillnotexist) is used by @BeeOnRope's uarch-bench. There's also Agner Fog'stestp
stuff (agner.org/optimize/#testp). – Peter Cordes