I would like know how can I execute two or more different kernels in parallel and at the same time? Obviously in the same GPU using OpenCL. My main idea is to use two different kernels (kernel A and Kernel B) but they need to use the same memory (I do not want to duplicate the memory by using one buffer for each in the “a” and “b” pointers). So is there another way I can accomplish the dual execution with an efficient memory technique? The codes of the kernels are the following: Kernel A:
_kernel void kernelA(global struct VectorStruct* a, int aLen0, global struct VectorStruct* b, int bLen0, global struct VectorStruct* c, int cLen0) {
int i = get_local_id(0);
c[(i)].x = a[(i)].x + b[(i)].x; }
Kernel B:
_kernel void kernelB(global struct VectorStruct* a, int aLen0, global struct VectorStruct* b, int bLen0, global struct VectorStruct* d, int cLen0){ int i = get_local_id(0); d[(i)].y = a[(i)].y + b[(i)].y; }
The definition for the struct VectorStruct is the following:
struct VectorStruct { int x; int y; };
In the host code I have to create four pointers: VectorStruct* a VectorStruct* b VectorStruct* c VectorStruct* d The poiner “a” and “b” have the data that I will transfer to GPU. The pointer “c” will storage the results of the kernel A, and the pointer “d” will storage the results of the kernel B.