0
votes

    __kernel
    void example(__global int *a, __global int *dependency, uint cols)
    {
        int j = genter code hereet_global_id(0);
        int i = get_global_id(1);
        if(i > 0 && j > 0)
        {
            while(1) 
        {
           test = 1;                
            }
            //Wait for the dependents

        -----------------------------

        --------------------------
        }
    }

In the above kernel code why the while loop is just skipped in all the threads with out infinitely looping. Any ideas on this. I'm working on some interesting problem which requires a thread to wait for some other threads to finish based on some criteria but every time while of above or while(wait_condition) is skipped when it is being run on GPU.

Is there any other way of making a particular thread to wait for the other threads in OpenCL kernel on GPU?

Thanks in advance!

1
It is worth pointing out that it is quite normal that not every work item queued in a kernel launch runs concurrently on the device. This means that any code employing a spinlock scheme (even a correctly designed one) can still fail with irrecoverable deadlock because spinning work items can be waiting for an action from a yet to be scheduled work item. The whole idea of whole device level synchronization is usually a bad one in GPU computing, and you are probably better served by using a different algorithm or a different architecture if you need it.talonmies

1 Answers

3
votes

At the high level, GPUs are data parallel computing devices. They like to run the same task on different data. They don't do well when their tasks do different things.

Your code is illustrative of a task parallel problem. So my high level question is what type of problem are you solving.? If it's a a task parallel problem then perhaps a GPU isn't the best solution. Would a multi-core CPU be an alternative?

You code is a typical of a 'spinlock'. Where the code loops until a value changes. Its often used for short term light weight locking in databases. This is dangerous code even on a CPU, as a mistake or error can lockup the CPU or GPU. For CPU code, a spinlock is usually protected with a interrupt timer. The usage is

1) set a timer 2) spin until a value changes 3) continue or time-out

So after the requisite number of ms the code is interrupted and an error is thrown. So if you use the spinlock pattern, for safety, add a loop exit in the while statement after a suitable number of loops have been completed.

In OpenCL reduction algorithms, its typical for the zero thread (get_global_id(0) == 0) to return the final singleton result. Prior to this all threads would been synchronized using a barrier call

__kernel
void  mytask( ...  , global float * result )
{
    int thread = get_global_id(0);

    ...  your code

    barrier( CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE ) // flush  global and local  variables or enqueue a memory fence see OpenCL spec for details


    if ( thread == 0)  //  Zero thread
      result[0] =  value;  //  set the singleton result as the zeroth array element

}