I am under the impression that the (single) warp scheduler in compute capability 1.x GPUs issues one instruction per warp every 4 cycles, and since the latency of the arithmetic pipeline is 24 cycles, it can be completely hidden by having 6 active warps at any one time.
For compute capability 2.1 GPUs, the Programming Guide mentions that "At every instruction issue time, each scheduler issues two independent instructions" while the post at How does the CUDA warp scheduler issue 2 instructions at a time for a warp? suggests that each scheduler can issue one instruction per warp per cycle.
So what is the exact latency of the warp scheduler? Every how many cycles an instruction is issued per warp? Is a different instruction (MIMD) being issued to any active and ready warp simultaneously?