I'm executing a multithreaded program on a AMD abu-dhabi architecture, that has 8 NUMA domains. I'm using numactl to allocate the threads in differents cores, and trying different memory policies. I want to measure the cache miss/hit arranged by NUMA domain but with tools like perf I obtained the an overall counter. I already reviewed tools like numastat, likwid, and hpctoolkit. Do you know of any tool that allows to obtain the standard performance counters separated by NUMA domains?
1
votes
2 Answers
1
votes
Is numastat not sufficient for your need?
>numastat
node0 node1 node2 node3
numa_hit 2511148413 2668024472 2541805396 2631938751
numa_miss 687767 186973 510852 79546
numa_foreign 544853 1772504 1306738 1461626
interleave_hit 14268 14291 14281 14309
local_node 2509822983 2667700745 2541325673 2631417570
other_node 2013197 510700 990575 600727
node4 node5 node6 node7
numa_hit 2551615375 2287945142 2199394273 2506262343
numa_miss 1178554 1863536 2037710 1278384
numa_foreign 1709984 541463 241266 244888
interleave_hit 14287 14274 14291 14294
local_node 2551212630 2278515165 2198877939 2505436756
other_node 1581299 11293513 2554044 2103971
0
votes
The Intel PCM package comes with a tool called pcm-numa.x. It tells you how many times each core accessed data from a local NUMA node, and also how many times from a remote node.