0
votes

In my device function, I store a value in host's global memory ( either pinned or zero-copy) millions of times. In my host function, a loop iterates and reads one value at a time from the global memory (s.t. I can see the value as it is produced from the device, instead of waiting for all the values to be produced ).

Which should I use between zero-copy and pinned for better performence?

1
What would be stopping you from benchmarking both approaches and determining the answer for yourself?talonmies
zero-copy and pinned memory are the same thing.Robert Crovella
@RobertCrovella That's not ture. stackoverflow.com/questions/5209214/…CottonCandy
Maybe you should read that link you provided, in its entirety, carefully. the unified address space feature in CUDA 4.0 will cause all pinned allocations to be mapped by default I know of no method in a 64-bit OS to create a pinned allocation that is not also mapped.Robert Crovella
Your understanding is wrong on a 64-bit OS. I've already pointed that out. There is a 1:1 correspondence between pinned and mapped memory on a 64-bit OS, regardless of the API call you use to allocate. Furthermore, device code has no method to access host memory that is not mapped (into device address space). I certainly acknowledge that zero-copy memory will work for the purpose you describe. But "pinned but not mapped" is not an option. 1. Because you can't create that on a 64-bit OS running CUDA 2. Because even if you could create it, you could not access it from device code.Robert Crovella

1 Answers

1
votes

In my device function, I store a value in host's global memory ( either pinned or zero-copy)

Which should I use between zero-copy and pinned for better performence?

On a 64-bit OS, where CUDA UVA is in effect, there is no meaningful difference between pinned memory and zero-copy (i.e. pinned and mapped). This is because, as stated here and elsewhere:

"the unified address space feature in CUDA 4.0 will cause all pinned allocations to be mapped by default"

The "unified address space feature in CUDA 4.0" is CUDA UVA, and it will automatically be in effect on a 64-bit OS where CUDA is being used (perhaps excepting windows 7 WDDM). Since 32-bit CUDA usage has been gradually deprecated for some time now, 64-bit OS is presumably what is being used by most, currently.

However, even if you were in a non-UVA regime, there would still be no way to answer the question. The reason for this is that pinned but not mapped host memory is not directly accessible to CUDA device code read/write activity, as you are asking about in your question. It is the mapping characteristic (so called "zero-copy") that allows CUDA device code to directly read and write locations in host memory.

So the functionality you desire could not be achieved anyway if you did have access to "pinned but not mapped" host memory.