What guarantees for CUDA CC3.x:
- All threads of one Warp always synchronized?
- All threads of one Half-Warp (but not the whole Warp) is always synchronized?
Ie when happen divergence of execution across the branches of conditional branch (if
, switch
, ...) do threads of first half-Warp go to one branch, and do threads of second Half-Warp go to another branch - simultaneously in a single moment, if they are both from the same single Warp?
Or the second half-Warp threads will be inactive(disabled) and will wait for the completion of the first half-Warp (first branch), and then for the second branch are contrary - swapped, first half-Warp will be disabled and wait for the eompletion of the second half-Warp (second branch), even if the divergence occurs across exactly half-Warp (exactly 16 threads)?
if(threadId.x < 16) { branch_1(); }
else { branch_2(); }
As said here: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability-3-0
Then, at every instruction issue time, each scheduler issues two independent instructions for one of its assigned warps that is ready to execute, if any.
Does this mean that it can be two independent instructions from different branches (1 and 2) for each half-Warp, or it only mean that it can be two independent instructions located consecutively in single branch for whole Warp?