I am working in a multikernel OpenCL implementation and I am not sure how different kernels map into compute units.
All my kernels execute concurrently and I think that only workgroups executing the same kernel are assigned to a single compute unit. So I deduce I have at least one compute unit for every different kernel I use. Am I right?
I know that I can use clGetDeviceInfo and look in the field CL_DEVICE_MAX_COMPUTE_UNITS, but it does not tell me how the kernels are distributed or how many compute units I am using.
And related to this question, if I do not specify how many compute units are going to be used with "attribute((num_compute_units(X)))", how many are used?
Thanks