I am using perf to test the code of a theoretically proven to be cache friendly algorithm.
According to this article the cache-misses to instructions is a good indicator of cache performance.
The ratio of cache-misses to instructions will give an indication how well the cache is working; the lower the ratio the better. In this example the ratio is 1.26% (6,605,955 cache-misses/525,543,766 instructions). Because of the relatively large difference in cost between the RAM memory and cache access (100’s cycles vs <20 cycles) even small improvements of cache miss rate can significantly improve performance. If the cache miss rate per instruction is over 5%, further investigation is required.
However when I run perf like this:
perf stat -B -e cache-references,cache-misses,instructions ./td 1.txt 2.txt
Perf will print the following:
Performance counter stats for './td 1.txt 2.txt':
93,497,101 cache-references
56,452,246 cache-misses # 60.379 % of all cache refs
8,115,626,200 instructions
2.509309040 seconds time elapsed
So it focuses more on the cache-references to cache-misses ratio instead of the one suggested in the article.
The cache-misses to cache-references ratio seems very bad, 60%, which means 60% of the time my application accesses the cache I get a cache miss. On the other hand the cache-misses to instructions ratio is only 0.6%.
I am not sure what to get out of this. Which ratio should I aim to optimize?