CUDA: Shared Memory Assignment

Question

Suppose I define a shared variable in a cuda kernel as follows:

__shared__ int var;

Now, let's say at some point in my kernel I'd like to assign some value, say 100 to var. Saying

var = 100;

results in all threads in the block executing this assignment.

How can I have the assignment take place only once? Is this my only option:

if( threadIdx.x == 0)
    var = 100;

?

Is there anything wrong with specifying the thread to do the assignment? What sort of alternative were you looking for? — alrikai
Perhaps he is afraid of the presence of the if statement? I think that it does not penalize the performance in this case. My question is: for the first solution (without the if statement), every thread performs a memory write. Does it happen in parallel or the writes are serialized? In other words, which is the most computationally efficient solution? — Vitality
@alrikai: No, there's nothing wrong with specifying a particular thread; just thought maybe the compiler (?) would be smart enough to figure out that all threads are doing the same thing and have it done just once. Omitting the thread assignment is just a bit cleaner, semantically. — Isaac Kleinman

talonmies talonmies · Accepted Answer · 2013-05-06T20:04:29

Your only option is actually this:

if( threadIdx.x == 0)
    var = 100;

__syncthreads();

If you omit a synchronisation barrier, there is no guarantee that all threads in the block will read the value of var after the assignment statement is executed.

CUDA: Shared Memory Assignment

1 Answers