In CUDA we can use pinned memory to more efficiently copy the data from Host to GPU than the default memory allocated via malloc at host. However there are two types of pinned memories the default pinned memory and the zero-copy pinned memory.
The default pinned memory copies the data from Host to GPU twice as fast as the normal transfers, so there's definitely an advantage (provided we have enough host memory to page-lock)
In the different version of pinned memory, i.e. zero-copy memory, we don't need to copy the data from host to GPU's DRAM altogether. The kernels read the data directly from the Host memory.
My question is: Which of these pinned-memory type is a better programming practice.