cuda: write to same global memory location by several threads

Question

i have a kernel where several threads will be writing to the same array location, let's say array[i], located in global memory. other related questions here in SO gave as an answer the use of atomics and other things. but no answer shows the actual cuda code. can anyone show a cuda code how array[i], i.e. array's location at index i, would be written by several threads atomically. thanks!

Greg Smith Greg Smith · Accepted Answer · 2012-08-02T02:19:04

CUDA provides compiler intrinsics for atomic operations. See the CUDA C Programming Guide for additional details on what atomic operations are available for each compute capability. counters is a pointer to an array of integers of size gridDim.x. Each thread will increment the array value indexed by it's blockIdx.x.

__global__ void CountThreadsInBlock(int* counters)
{
    int i = blockIdx.x;
    atomicAdd(&counters[i], 1);
}

// NOTE: Assume 1D launch.

cuda: write to same global memory location by several threads

1 Answers