1
votes

This question is similar as GLSL memoryBarrierShared() usefulness? .

However I wonder when do we have to use subgroupMemoryBarrier and similar functions since the subgroupBarrier performs both an execution and a memory barrier. For the memoryBarrierfunction I understand, because barrier function does not perform a memory barrier. so you must use both :

memoryBarrier(); // memoryBarrierShared, Buffer, Image...
barrier();

But I do not know when can I use subgroupMemoryBarrier because it is already done by the subgroupBarrier.

GL_KHR_shader_subgroup extension

The function subgroupBarrier() enforces that all active invocations within a subgroup must execute this function before any are allowed to continue their execution, and the results of any memory stores performed using coherent variables performed prior to the call will be visible to any future coherent access to the same memory performed by any other shader invocation within the same subgroup.

I don't think they have made these functions if they are not useful. So I wonder when do we need to use them?

Is it because on a subgroup, it is assumed that they run in parallel, so, you can just issue a subgroupMemoryBarrier. But in this case, when do you have to use subgroupBarrier?

1

1 Answers

1
votes

There are two very different behaviours here, MemoryBarrier() and Barrier(). They both have barrier in the name, but really they have totally different effects.

Memory barriers are designed to ensure some relatively ordering of memory within the scope of a single thread of execution (e.g. a single compute work item). Memory accesses from before the barrier must have completed before any access after the barrier are allowed to take place. In traditional CPU code this is useful for things like locks - e.g. make sure the lock is successfully taken and written to memory before you touch the structure which it protected. The execution of the threads inside the subgroup relative to each other is not impacted so you can run things in parallel without draining out the pipe, and one thread in the subgroup can run code from before the memory barrier while another is running code from after the memory barrier.

Full barriers are designed to realign execution across the subgroup. No thread in the subgroup can run any code from after the barrier until all threads have reached the barrier, which implicitly means that they also provides memory barrier semantics. This is what you need when you want to rely on lockless algorithms where one thread needs to make assumptions about where another thread in the subgroup has reached. For example, waiting for the thread for localInvocation 0 to populate local memory.