Are OpenCL work items executed in parallel?

Question

I know that work items are grouped into the work groups, and you cannot synchronize outside of a work group.

Does it mean that work items are executed in parallel?

If so, is it possible/efficient to make 1 work group with 128 work items?

mfa mfa · Accepted Answer · 2012-01-24T00:48:09

The work items within a group will be scheduled together, and may run together. It is up to the hardware and/or drivers to choose how parallel the execution actually is. There are different reasons for this, but one very good one is to hide memory latency.

On my AMD card, the 'compute units' are divided into 16 4-wide SIMD units. This means that 16 work items can technically be run at the same time in the group. It is recommended that we use multiples of 64 work items in a group, to hide memory latency. Clearly they cannot all be run at the exact time. This is not a problem, because most kernels are in fact, memory bound, so the scheduler (hardware) will swap the work items waiting on the memory controller out, while the 'ready' items get their compute time. The actual number of work items in the group is set by the host program, and limited by CL_DEVICE_MAX_WORK_GROUP_SIZE. You will need to experiment with the optimal work group size for your kernel.

The cpu implementation is 'worse' when it comes to simultaneous work items. There are only ever as many work items running as you have cores available to run them on. They behave more sequentially in the cpu.

So do work items run at the exactly same time? Almost never really. This is why we need to use barriers when we want to be sure they pause at a given point.

Are OpenCL work items executed in parallel?

5 Answers