Here in the documentation, it is stated that prefetch
and prefetchu
ptx instructions "prefetch line containing a generic address at a specified level of memory hierarchy, in specified state space". It is also mentioned that the syntax is
prefetch{.space}.level [a]; // prefetch to data cache
prefetchu.L1 [a]; // prefetch to uniform cache
.space = { .global, .local };
.level = { .L1, .L2 };
I would like to know what uniform cache
is being referred to here; while the syntax (in the 2nd line) specifies the data is going to be prefetched into L1? Isn't prefetchu
redundant while there exists prefetch
instruction that allows prefetching to L1 as well? For example what is the difference between below lines of code?
prefetch.global.L1 [a]; // a maps to global memory.
prefetchu.L1 [a]; // a maps to global memory.
prefetch
operation right after when the address is discovered and then schedule my non-dependent operations. When the content of that address is needed, hopefully it can be found in the cache. Basically I'm trying to hide the memory access latency. – Farzadprefetchu
is using the same mechanism as LDU. Not sure it has any meaning on a non cc2.x device. I suspectprefetch
in general could be interpreted by the ptxas compiler in more than one way. Inspecting the SASS the emanates from these instructions (if any) might be instructive. – Robert Crovella