Does creating a CUDA context with CU_CTX_SCHED_BLOCKING_SYNC make CUDA kernel launches actually synchronous (i.e. stalling the CPU thread as a normal CPU same-thread function would)?
Documentation only states
CU_CTX_SCHED_BLOCKING_SYNC: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the GPU to finish work.
but I'm not sure I understood it right.