From a resource standpoint (registers, shared memory, etc.) the important unit is the threadblock, not the warp.
In order to schedule a threadblock for execution, there must be enough free resources on the SM to cover the needs of the entire threadblock. All threadblocks in a grid will have exactly the same resource requirements.
If the SM has no currently executing threadblocks, (such as at the point of kernel launch) then the SM should have at least enough resources to cover the needs of a single threadblock. If that is not the case, the kernel launch will fail. This could happen, for example, if the number of registers per thread, times the number of threads per block, exceeded the number of registers in the SM.
After the SM has a single threadblock scheduled, additional threadblocks can be scheduled depending on the available resources. So to extend the register analogy, if each threadblock required 30K registers (regs/thread * threads/block), and the SM had a max of 64K register, then at most two threadblocks could be scheduled (i.e. their warps could possibly be brought into execution by the SM).
In this way, any warp that could possibly be brought into execution already has enough resources allocated for it. This is a principal part of the scheduling mechanism that allows the SM to switch execution from one warp to another with zero delay (fast context switching).