1
votes

I'm executing a multithreaded program on a AMD abu-dhabi architecture, that has 8 NUMA domains. I'm using numactl to allocate the threads in differents cores, and trying different memory policies. I want to measure the cache miss/hit arranged by NUMA domain but with tools like perf I obtained the an overall counter. I already reviewed tools like numastat, likwid, and hpctoolkit. Do you know of any tool that allows to obtain the standard performance counters separated by NUMA domains?

2
Using --per-socket parameter in perf stat allowed me to get counters needed. Although is not exactly what I wanted because in AMD architectures there are 2 NUMA nodes per socket.Jofe

2 Answers

1
votes

Is numastat not sufficient for your need?

>numastat
                           node0           node1           node2           node3
numa_hit              2511148413      2668024472      2541805396      2631938751
numa_miss                 687767          186973          510852           79546
numa_foreign              544853         1772504         1306738         1461626
interleave_hit             14268           14291           14281           14309
local_node            2509822983      2667700745      2541325673      2631417570
other_node               2013197          510700          990575          600727

                           node4           node5           node6           node7
numa_hit              2551615375      2287945142      2199394273      2506262343
numa_miss                1178554         1863536         2037710         1278384
numa_foreign             1709984          541463          241266          244888
interleave_hit             14287           14274           14291           14294
local_node            2551212630      2278515165      2198877939      2505436756
other_node               1581299        11293513         2554044         2103971
0
votes

The Intel PCM package comes with a tool called pcm-numa.x. It tells you how many times each core accessed data from a local NUMA node, and also how many times from a remote node.