Currently I develop a GPU-based program that use multiple kernels that are launched concurrently by using multiple streams.
In my application, multiple kernels need to access a queue/stack and I have plan to use atomic operations.
But I do not know whether atomic operations work between multiple kernels concurrently launched. Please help me anyone who know the exact mechanism of the atomic operations on GPU or who has experience with this issue.