I want to select only one thread per warp for a few operations.
For example, lets start with 1-D block dim of (64, 1, 1). As I understand, this will result in two warps considering the warp size is 32. In this case, I can use the following code to access one thread per warp:
if(threadIdx.x % 32 == 0) { ... }
First of all, does this make sense as I am not sure if we know how threads are mapped to the warp on hardware?
Secondly, how can be this achieved for 2-D block dim of (32, 32, 1)? Now here simple % 32 won't work as the thread indexing in two dimensions will be different?
Thanks.