20
votes

In order to reduce the transfer time from host to device for my application, I want to use pinned memory. NVIDIA's best practices guide proposes mapping buffers and writing the data using the following code:

cDataIn = (unsigned char*)clEnqueueMapBuffer(cqCommandQue, cmPinnedBufIn, CL_TRUE,CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, NULL);

for(unsigned int i = 0; i < memSize; i++) 
{ 
    cDataIn[i] = (unsigned char)(i & 0xff); 
}

clEnqueueWriteBuffer(cqCommandQue, cmDevBufIn, CL_FALSE, 0, 
szBuffBytes, cDataIn, 0, NULL, NULL);

Intel's optimization guide recommends to use calls to clEnqueueMapBuffer and clEnqueueUnmapBuffer instead of calls to clEnqueueReadBuffer or clEnqueueWriteBuffer.

What is the right way to use pinned memory/mapped memory? Is it necessary to write the data using enqueueWriteBuffer or is enqueueMapBuffer sufficient?

Also, what is the difference between CL_MEM_ALLOC_HOST_PTR and CL_MEM_USE_HOST_PTR?

1

1 Answers

17
votes

This is an interesting topic that very little people detail. I will try to define exactly how it works.

The pinned memory refers to a memory that as well as being in the device, exists in the host, so a DMA write is possible between these 2 memories. Increasing the copy performance. That is why it needs CL_MEM_ALLOC_HOST_PTR in the buffer creation params.

On the other hand, CL_MEM_USE_HOST_PTR will take a host pointer for buffer creation, it is unclear by the spec if this can or cannot be a pinned memory. But generally speaking, it should NOT be pinned memory created this way, since the host pointer has not been reserved by the OpenCL API and is not clear where it resides in memory.


Regarding the Map/Read question. Both are ok. And they will give same performance. The difference between the both techniques is that:

  • For Map/Unmap: You need to map before writing/reading and unmap afterwards. That way you ensure the consistency of the data. These are API calls, and take time to complete as well as being asynchronous. The good thing, is that you don't need to hold any other thing rather than the buffer object.
  • For Map+Read/Write: At the creation of the memory zone you need to do a Map and save the pointer value. Then, at the destruction of the buffer, you need to first Unmap and then destroy it. You need to hold buffer+Mapped_Buffer all along. The good thing is that you can now just clEnqueueRead/Write to that mapped pointer. The API will wait for the pinned data to be consistent and then consider it done. It is easier to use, since it is like doing a map+unmap in one shot.

The Read/Write mode is easier to use, specially for repetitive reads, but is not as versatile as the manual map option, since you CAN'T write a read only map, nor read a write only map. But for general use the variables that are read will never be written, and viceversa.


My understanding is that Intel recommendation, refers to "Use Map, not plain Read/Write", rather than "When you use Map, don't use Read/Write over Mapped pointers".

Did you check this nVIDIA recomendation over Intel HW? I think it should work, however I don't know if indeed the operation would be optimal (as in AMD or nVIDIA HW).