I am a beginner to OpenCL. I am implementing an algorithm on AMD 8670M(GCN Architecture) device. I am using OpenCL local memory to store frequently accessed global data. According to the device specificatons there are :
a) 5 compute units each having 64 KB of local memory.So device as a whole has 320 KB.
b) Maximum 2560 work-items on a compute unit.
I launched a kernel with 8 work-groups,each work-group having 256 work-items.Each work-group utilizes 16 KB of local memory. So the kernel uses :
a) 2048 work-items
b) 128 KB local memory
2048 work-items fit on a single compute unit but a compute unit provides only 64 KB local memory.So,two compute units are required to provide required local memory.
According to my understanding now there can be two ways of kernel launching
1) Work-groups are distributed to two compute units to provide required local memory.
2) Work-groups are assigned to only one compute unit and excess local memory is spilled out to global memory.
Which of the above cases are likely to occur? Is there any way of checking number of active wave-fronts on each compute unit? Any suggestions are appreciated.Thanks in advance.