I was reading about CUDA streams and events. From the thread with link given below, the moderator stated (I quote):
In CUDA, commands submitted to a stream are guaranteed to complete in order. If the application submits a grid launch and an event record to a stream then the driver will push the grid launch, a synchronization command, and the event record to a connection. The front end will not process the event record command until the kernel launch completes and clears the synchronization token. The connection is blocked. On compute capability 3.5 devices the front end can continue to process other connections. On compute capability < 3.5 devices the front end is simply blocked.
I tried hard but i can't understand why the moderator states that the connection is blocked. Any explanation, please? Thank you.
Thread URL: https://devtalk.nvidia.com/default/topic/599056/concurrent-kernel-and-events-on-kepler/?offset=4