Why aren't there bank conflicts in global memory for Cuda/OpenCL?

Question

One thing I haven't figured out and google isn't helping me, is why is it possible to have bank conflicts with shared memory, but not in global memory? Can there be bank conflicts with registers?

UPDATE Wow I really appreciate the two answers from Tibbit and Grizzly. It seems that I can only give a green check mark to one answer though. I am newish to stack overflow. I guess I have to pick one answer as the best. Can I do something to say thank you to the answer I don't give a green check to?

Bank conflicts can happen at other levels of the memory hierarchy as well as in the register file. Shared memory bank conflicts can significantly impact kernel performance and are completely controllable by the developer. Other types of bank conflicts have less impact on performance and cannot be resolved by the developer so they are not communicated to the developer. — Greg Smith

M. Tibbits M. Tibbits · Accepted Answer · 2010-10-02T19:52:47

Short Answer: There are no bank conflicts in either global memory or in registers.

Explanation:

The key to understanding why is to grasp the granularity of the operations. A single thread does not access the global memory. Global memory accesses are "coalesced". Since global memory is soo slow, any access by the threads within a block are grouped together to make as few requests to the global memory as possible.

Shared memory can be accessed by threads simultaneously. When two threads attempt to access an address within the same bank, this causes a bank conflict.

Registers cannot be accessed by any thread except the one to which it is allocated. Since you can't read or write to my registers, you can't block me from accessing them -- hence, there aren't any bank conflicts.

Who can read & write to global memory?

Only blocks. A single thread can make an access, but the transaction will be processed at the block level (actually the warp / half warp level, but I'm trying not be complicated). If two blocks access the same memory, I don't believe it will take longer and it may happen accelerated by the L1 cache in the newest devices -- though this isn't transparently evident.

Who can read & write to shared memory?

Any thread within a given block. If you only have 1 thread per block you can't have a bank conflict, but you won't have reasonable performance. Bank conflicts occur because a block is allocated with several, say 512 threads and they're all vying for different addresses within the same bank (not quite the same address). There are some excellent pictures of these conflicts at the end of the CUDA C Programming Guide -- Figure G2, on page 167 (actually page 177 of the pdf). Link to version 3.2

Who can read & write to registers?

Only the specific thread to which it is allocated. Hence only one thread is accessing it at one time.

Why aren't there bank conflicts in global memory for Cuda/OpenCL?

3 Answers