Multiple threads access the same cuda stream

Question

I am writing a mixed cpu-gpu program that require multiple cpu threads to access multiple gpus. Is CUDA stream thread-safe? Specifically, I wonder if the following is correct:

// two threads concurrently enter cuda device 1 and 
// launch kernel on the same stream

std::thread t1([&](){
  cudaSetDevice(1);
  cudaEventRecord(begin_t1, stream);
  kernel<<<mygrid, myblock, 0, stream>>>(...);
  cudaEventRecord(end_t1, stream);
});

std::thread t2([&](){
  cudaSetDevice(1);
  cudaEventRecord(begin_t2, stream);
  kernel<<<mygrid, myblock, 0, stream>>>(...);
  cudaEventRecord(end_t2, stream);
});

seems to indicate cuda is thread safe: devblogs.nvidia.com/… — Alan Birtles
cuda in this context is thread safe and your code should execute without any difficulty. However, there is nothing to sort out the sequence of the two threads placing work into the single stream. Therefore, the ordering of the work issued between the two threads could have any ordering. For example, the events from thread 2 could capture both kernel launches, to pick one example. CUDA doesn't sort that out for you or provide some kind of lock for a thread that is issuing work into a stream. — Robert Crovella

Robert Crovella Robert Crovella · Accepted Answer · 2019-09-14T22:43:09

It is legal for multiple host threads to access and use the same stream.

However, there is nothing in CUDA that guarantees the order of operations with respect to different threads. Therefore, with respect to the stream in question here, this sequence is possible:

begin_t1,kernel,end_t1,begin_t2,kernel,end_t2

but this is also possible:

begin_t1,begin_t2,kernel,kernel,end_t1,end_t2

If you want to enforce ordering between threads, you will need to do that using mechanisms provided by the threading system you are using.

Multiple threads access the same cuda stream

1 Answers