This is an interesting topic that very little people detail.
I will try to define exactly how it works.
The pinned memory refers to a memory that as well as being in the device, exists in the host, so a DMA write is possible between these 2 memories. Increasing the copy performance.
That is why it needs CL_MEM_ALLOC_HOST_PTR
in the buffer creation params.
On the other hand, CL_MEM_USE_HOST_PTR
will take a host pointer for buffer creation, it is unclear by the spec if this can or cannot be a pinned memory. But generally speaking, it should NOT be pinned memory created this way, since the host pointer has not been reserved by the OpenCL API and is not clear where it resides in memory.
Regarding the Map/Read question. Both are ok. And they will give same performance.
The difference between the both techniques is that:
- For Map/Unmap: You need to map before writing/reading and unmap afterwards. That way you ensure the consistency of the data. These are API calls, and take time to complete as well as being asynchronous. The good thing, is that you don't need to hold any other thing rather than the buffer object.
- For Map+Read/Write: At the creation of the memory zone you need to do a Map and save the pointer value. Then, at the destruction of the buffer, you need to first Unmap and then destroy it. You need to hold
buffer+Mapped_Buffer
all along. The good thing is that you can now just clEnqueueRead/Write
to that mapped pointer. The API will wait for the pinned data to be consistent and then consider it done. It is easier to use, since it is like doing a map+unmap in one shot.
The Read/Write mode is easier to use, specially for repetitive reads, but is not as versatile as the manual map option, since you CAN'T write a read only
map, nor read a write only
map. But for general use the variables that are read will never be written, and viceversa.
My understanding is that Intel recommendation, refers to "Use Map, not plain Read/Write", rather than "When you use Map, don't use Read/Write over Mapped pointers".
Did you check this nVIDIA recomendation over Intel HW? I think it should work, however I don't know if indeed the operation would be optimal (as in AMD or nVIDIA HW).