How can I write a statement in my CUDA kernel that is executed by a single thread. For example if I have the following kernel:
__global__ void Kernel(bool *d_over, bool *d_update_flag_threads, int no_nodes)
{
int tid = blockIdx.x*blockDim.x + threadIdx.x;
if( tid<no_nodes && d_update_flag_threads[tid])
{
...
*d_over=true; // writing a single memory location, only 1 thread should do?
...
}
}
In above kernel, "d_over" is a single boolean flag while "d_update_flag_threads" is a boolean array.
What I normally did before is using the first thread in the thread block e.g.:
if(threadIdx.x==0)
but It could not work in this case as I have a flag array here and only threads with assosiated flag "true" will execute the if statement. That flag array is set by another CUDA kernel called before and I don't have any knowledge about it in advance.
In short, I need something similar to "Single" construct in OpenMP.
if, create a newif(threadIdx.x == 0)for the assignment, and then resume control with a newif? - Jared Hoberock