I'm confused about how clEnqueueNDRangeKernel() works when called multiple times. Let's say I enqueue 10 times (for example, with a for loop), each time with global_work_size = 32. Let's say the kernel takes a global argument that it populates with get_global_id(0).
My question is about the enumeration of the global_id's.
What I expected: The highest-numbered global_id would be (10*32-1)=319.
What actually happens: The highest-numbered global_id is (32-1)=31.
Can anyone explain how each work item is enumerated, step-by-step, as multiple clEnqueueNDRangeKernel() calls are made?