In my device function, I store a value in host's global memory ( either pinned or zero-copy)
Which should I use between zero-copy and pinned for better performence?
On a 64-bit OS, where CUDA UVA is in effect, there is no meaningful difference between pinned memory and zero-copy (i.e. pinned and mapped). This is because, as stated here and elsewhere:
"the unified address space feature in CUDA 4.0 will cause all pinned allocations to be mapped by default"
The "unified address space feature in CUDA 4.0" is CUDA UVA, and it will automatically be in effect on a 64-bit OS where CUDA is being used (perhaps excepting windows 7 WDDM). Since 32-bit CUDA usage has been gradually deprecated for some time now, 64-bit OS is presumably what is being used by most, currently.
However, even if you were in a non-UVA regime, there would still be no way to answer the question. The reason for this is that pinned but not mapped host memory is not directly accessible to CUDA device code read/write activity, as you are asking about in your question. It is the mapping characteristic (so called "zero-copy") that allows CUDA device code to directly read and write locations in host memory.
So the functionality you desire could not be achieved anyway if you did have access to "pinned but not mapped" host memory.