Suppose I define a shared variable in a cuda kernel as follows:
__shared__ int var;
Now, let's say at some point in my kernel I'd like to assign some value, say 100
to var
. Saying
var = 100;
results in all threads in the block executing this assignment.
How can I have the assignment take place only once? Is this my only option:
if( threadIdx.x == 0)
var = 100;
?
if
statement? I think that it does not penalize the performance in this case. My question is: for the first solution (without theif
statement), every thread performs a memory write. Does it happen in parallel or the writes are serialized? In other words, which is the most computationally efficient solution? - Vitality