3
votes

I'm trying to read PMC (Performance Monitoring Counter) by using RDMSR and WRMSR instructions.

In my Linux desktop which has Intel i7 6700 CPU (Skylake), I wrote a simple driver code:

static int my_init(void)
{
    unsigned int msr;
    u64 low, high;

    msr = 0x187;
    low = 0x412e;
    high = 0x0;

    asm volatile("1: wrmsr\n"
            "2:\n"
            : : "c" (msr), "a"(low), "d" (high) : "memory");

    msr = 0xC2;
    asm volatile("1: rdmsr\n"
            "2:\n"
            : "=a" (low), "=d" (high) : "c" (msr)); 

    printk("val: %lu\n", (low) | ((high) << 32));

    return  0;
}

Referring to the Intel manual (18.2 ARCHITECTURAL PERFORMANCE MONITORING in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide), in above code, I write "0x412e" (# of L3 cache-misses) to "0x187" (IA32_PERFEVTSEL1 MSR) and read "0xC2" (IA32_PMC1 MSR).

However, according to the manual, the number of cache-misses must be returned in EAX: EDX (EAX contains low bit), and in practice, 0 is returned as value of the low (EAX) and high (ECX) values.

I want to know how to monitor performance event of Intel CPU by using MSR pair (IA32_PERFEVTSELx and IA32_PMCx). More specifically, the number of cache-misses is my goal.

If you have any idea about this, I would appreciate your advice. Thanks.

2
BTW, you don't have to write your own driver. Besides Linux's perf subsystem, there are a couple direct-access implementations already that let you program the perf counters and then read them directly with rdpmc in user-space. e.g. github.com/obilaniu/libpfc (by SO user @Iwillnotexist) is used by @BeeOnRope's uarch-bench. There's also Agner Fog's testp stuff (agner.org/optimize/#testp).Peter Cordes
Thanks for your comment. I will check out the macro. By using it, I think it becomes easier to get to my goal.nickeys

2 Answers

4
votes

Your programming of PERFEVTSEL1 is incomplete. PERFEVTSEL1

As the very least, you should enable counting in bit 22:

  • EN (Enable Counters) Flag (bit 22) — When set, performance counting is enabled in the corresponding performance-monitoring counter; when clear, the corresponding counter is disabled.
-1
votes

you can take a view on this source code HPCTestDrv.c