1
votes

For some programs (not only one) I see that for most of the kernels, cache utilizations (l2 and unified) are low (up to 3 in the scale of 1 to 10). The programs are not toy and simple. Is that normal? The device is M2000.

I would like to know how cache utilization is measured? I didn't find any explanation about that in the documents.

1

1 Answers

4
votes

If the kernel is limited by some other factor, such as compute or memory bound, then it is normal for the cache utilization to be low. The only way you can get the cache utilization really high (7 or higher) is to have a lot of data reuse in that cache.

The cache utilization should be measured as a percentage (from 0 to 10, 10 being 100%) of peak cache bandwidth (apparently with some normalization).

Often (will vary by GPU, and not clearly published) the available L2 cache bandwidth is around 2x or more the available memory (i.e. GPU DRAM) bandwidth. Therefore, to get a reading above 5 on this metric, the data bandwidth into your code as seen at the L2 would have to be higher than memory bandwidth. This usually implies data reuse.

It should be possible to write a test microbenchmark to explore this.