Why kernel codes which are using shared memory must be necessarily synchronized?(CUDA)

Question

Theoretical question about CUDA and GPU parallel calculations.

As I know, kernel is a code, function, which is execute by GPU. Each kernel has a(is executed by) grid which consists blocks and blocks have threads. So each kernel(code) is executed by even thousands of threads.

I have question about shared memory and kernel codes synchronization. Could you justify the necessity of synchronization in kernel codes which are using shared memory? How the synchronization affects the processing efficiency?

__syncthreads() is frequently found in kernels that use shared memory, after the shared memory load, to prevent race conditions. Since the shared memory is usually loaded cooperatively (by all threads in the block), it's necessary to make sure that all threads have completed the loading operation, before any thread begins to use the loaded data for further processing. — Robert Crovella
@Robert Crovella thank you, that is answer that I was looking for! — Adamm

Unknown Unknown · Accepted Answer · 2016-03-14T17:24:14

CW answer to get this off the unanswered list:

Could you justify the necessity of synchronization in kernel codes which are using shared memory?

__syncthreads() is frequently found in kernels that use shared memory, after the shared memory load, to prevent race conditions. Since the shared memory is usually loaded cooperatively (by all threads in the block), it's necessary to make sure that all threads have completed the loading operation, before any thread begins to use the loaded data for further processing

__syncthreads() is documented here.

Note that it only synchronizes threads within a given block, not grid-wide.

Why kernel codes which are using shared memory must be necessarily synchronized?(CUDA)

1 Answers