My OpenCL kernel requires a few MB of input data, about 300MB of temporary global memory for work, and it returns only a few MB. The only way I know to give the kernel this temporary memory is to allocate this memory with malloc and then pass it with clCreateBuffer, but it takes some time to copy 300MB to GPU and also requires 300MB of host RAM. Is it possible to skip it and either allocate global device memory inside of kernel or somehow declare a buffer with 300Mb but do not create it with malloc and do not copy it to GPU?
2
votes
If you dont issue a read/write enqueue, then it will not do any copying if the parameters are right.
- huseyin tugrul buyukisik
So I should call clCreateBuffer(context, CL_MEM_READ_WRITE, 300*1024*1024, NULL, NULL) and that is all?
- Mike
Im doing this for my fluid modeling kernels, they all using many temporary buffers like this. 300*1024*1024*sizeof(cl_float) or cl_whatever you need. But this is only for single GPU usage. For many GPUs, you need a different way.
- huseyin tugrul buyukisik
thank you, I will use this approach then
- Mike
One more thing, if one buffer is shared between kernels, that buffer is said to be in sync if both kernels are in same queue. When in different queue-kernels share the same buffer, you can need explicit sync. Have fun.
- huseyin tugrul buyukisik