CUDA streams and events: clarification

Question

I was reading about CUDA streams and events. From the thread with link given below, the moderator stated (I quote):

In CUDA, commands submitted to a stream are guaranteed to complete in order. If the application submits a grid launch and an event record to a stream then the driver will push the grid launch, a synchronization command, and the event record to a connection. The front end will not process the event record command until the kernel launch completes and clears the synchronization token. The connection is blocked. On compute capability 3.5 devices the front end can continue to process other connections. On compute capability < 3.5 devices the front end is simply blocked.

I tried hard but i can't understand why the moderator states that the connection is blocked. Any explanation, please? Thank you.

Thread URL: https://devtalk.nvidia.com/default/topic/599056/concurrent-kernel-and-events-on-kepler/?offset=4

Connections in this context act like command queues. Devices with compute capability < 3.5 only have a single connection/command queue. While logically there can be multiple streams, by the time the commands get sent off to the device they all end up in the same queue and cannot overtake each other anymore. So one command blocking the device side, like recording an event, blocks all others. Later devices support multiple connections, so (up to some limit) each stream gets it's own command queue. — tera
@tera: that would be a perfect answer if you care to add it. — talonmies

tera tera · Accepted Answer · 2017-03-08T09:03:30

Connections in this context act like command queues. Devices with compute capability < 3.5 only have a single connection/command queue. While logically there can be multiple streams, by the time the commands get sent off to the device they all end up in the same queue and cannot overtake each other anymore.

So one command blocking the device side, like recording an event, blocks all others. Later devices support multiple connections, so (up to some limit) each stream gets it's own command queue.

CUDA streams and events: clarification

1 Answers