In CUDA, we can achieve kernel managed data transfer from host memory to device shared memory by device side pointer of host memory. Like this:
int *a,*b,*c; // host pointers
int *dev_a, *dev_b, *dev_c; // device pointers to host memory
…
cudaHostGetDevicePointer(&dev_a, a, 0); // mem. copy to device not need now, but ptrs needed instead
cudaHostGetDevicePointer(&dev_b, b, 0);
cudaHostGetDevicePointer(&dev_c ,c, 0);
…
//kernel launch
add<<<B,T>>>(dev_a,dev_b,dev_c);
// dev_a, dev_b, dev_c are passed into kernel for kernel accessing host memory directly.
In the above example, kernel code can access host memory via dev_a
, dev_b
and dev_c
. Kernel can utilize these pointers to move data from host to shared memory directly without relaying them by global memory.
But seems that it is an mission impossible in OpenCL? (local memory in OpenCL is the counterpart of shared memory in CUDA)