Prefetching in Nvidia CUDA

Question

I'm working on data prefetching in nVidia CUDA. I read some documents on prefetching on device itself i.e. Prefetching from shared memory to cache.

But I'm interested in data prefetching between CPU and GPU. Can anyone connect me with some documents or something regarding this matter. Any help would be appreciated.

Your question is way too broad in its current form - try asking a more specific question. You might also want to check out the nVidia developer forums at developer.nvidia.com. — Paul R
Ok..how can I add prefetch instruction in given CUDA program?? — username_4567
This is still very vague - prefetch what to what exactly ? For what purpose ? On what generation of GPU ? — Paul R

CygnusX1 CygnusX1 · Accepted Answer · 2011-10-18T08:44:42

Answer based on your comment:

when we to want perform computation on large data ideally we'll send max data to GPU,perform computation,send it back to CPU i.e SEND,COMPUTE,SEND(back to CPU) now whn it sends back to CPU GPU has to stall,now my plan is given CU program,say it runs in entire global mem,i'll compel it to run it in half of the global mem so that rest of the half i can use for data prefetching,so while computation is being performed in one half simultaneously i cn prefetch data in otherhalf.so no stalls will be there..now tell me is it feasible to do?performance will be degraded or upgraded?should enhance..

CUDA streams were introduced to enable exactly this approach.

If your compoutation is rather intensive, then yes --- it can greatly speed up your performance. On the other hand, if data transfers take, say, 90% of your time, you will save only on computation time - that is - 10% tops...

The details, including examples, on how to use streams is provided in CUDA Programming Guide. For version 4.0, that will be section "3.2.5.5 Streams", and in particular "3.2.5.5.5 Overlapping Behavior" --- there, they launch another, asynchronous memory copy, while a kernel is still running.

Prefetching in Nvidia CUDA

3 Answers