1
votes

everyone here. Because OpenCL can run kernels on a CPU in a similar fashion as on a GPU. In a GPU, there are private memory(registers) and shared memory physically, but if I choose a CPU as the OpenCL device, how are private memory and shared memory implemented?

I mean, are they emulated by DRAM or something else(L1, L2 even L3 cache? I am not sure). Besides, the performance of using shared memory on CPU will be limited comparing with GPU, right?

2

2 Answers

4
votes

No language has direct access to CPU cache (was cited but I don't have enough rep to have 3 urls...). Which in turn means that there is no way for OpenCL to keep private memory in cache.

In this presentation from AMD they simply refer to the memory model as a series of memory objects abstracted by the context (page 16). As long as the buffer is available to the devices in the context, they will be readable. When it comes to the different types of kernel memory, you can safely assume that there will be no performance difference between them when running on a CPU instead of a GPU (as there are different types of DRAM).

Keep in mind however, that host memory and local memory would still differ if you are computing on a cluster, which would still require you to take transfer rates into account. On the second part of your question, please see this article on memory models in OpenCL. There is performance to gain from structuring your program in such a way that you only need to communicate within a given workgroup.

For further reading, please see -http://software.intel.com/sites/landingpage/opencl/optimization-guide/index.htm

-1
votes

OpenCL memory model was created based on GPU architecture. On CPU, Accesses to global memory, shared memory and constant memory goes through the same cache hierarchy.

Of course, implementing with local memory can improve the performance as the cache hits will be increased.