Communication between CUDA threads/thread blocks

Question

I am trying to "map" a few tasks to CUDA GPU. There are n tasks to process. (See the pseudo-code)

malloc an boolean array flag[n] and initialize it as false.
for each work-group in parallel do
    while there are still unfinished tasks do
        Do something;
        for a few j_1, j_2, .. j_m (j_i<k) do
            Wait until task j_i is finished; [ while(flag[j_i]) ;  ]
            Do Something;
        end for
        Do something;
        Mark task k finished;  [  flag[k] = true;  ]
    end while
end for

For some reason, I will have to use threads in different thread block.

The question is how to implement the Wait until task j_i is finished; and Mark task k finished; in CUDA. My implementation is to use an boolean array as the flag. Then set flag once a task is done, and read the flag to check if a task is done.

But it only works on small case, one large case, the GPU get crashed with unknown reason. Is there any better way to implement the Wait and Mark in CUDA.

That's basically a problem of inter-thread communication on CUDA.

You will have to use atomic operations for this.. That costs a lot. — angainor

Fr34K Fr34K · Accepted Answer · 2012-09-14T18:43:50

I think you dont need to implement in CUDA. Every thing can be implemented on CPU. You are waiting for a task to complete, then doing another task randomly. If you want to implement in CUDA, you dont need to wait for all the flags to be true. You know initially that all the flags are false. So just implement Do something in parallel for all the thread and change the flag to true.

If you want to implement in CUDA, take int flag and keep on adding 1 it after finishing Do something so that you can know the change in flag before and after doing Do something.

If i got your question wrong, please comment. I'll try to improve the answer.

Communication between CUDA threads/thread blocks

4 Answers