0
votes

is that difference in speed due to technology with which both were made of( i read that shared memory is a scratchpad memory that is mainly SRAM memory while global memory is typically a DRAM memory)?

what if both were made with same technology, will be any differences in performance based on shared memory is on-chip and global memory is off-chip due to extra instructions(load instructions) or extra hardware circuit needed for global memory to load it's data into the processor?

1
Is this a programming question? It seems like a hardware design question.Robert Crovella

1 Answers

2
votes

At least two reasons are the ones you've already pointed out. There is a:

  1. Location difference - shared memory is on-chip, global memory (at least, ordinary global memory accesses that do not hit in one of the caches) are off-chip. Memory is generally clocked at a fixed frequency, and the maximum frequency will depend on how fast the system can be clocked. Long transmission lines, buffers that drive signals from off-chip to on-chip or vice versa, and many other circuit effects will slow down the maximum rate that a particular circuit can be clocked. Therefore the shared memory is considerably advantaged by being on-chip. The caches (L1, L2, read-only, constant cache, texture cache, etc.) all benefit from the same principle.

  2. Technology difference. An SRAM cell (e.g. shared memory) might be clocked faster than a DRAM cell (e.g. off-chip global memory), and SRAM is more amenable to fast random access. DRAM has a more complicated access sequence that comes into play when a cell is accessed. DRAM is also burdened by mechanisms such as refresh that may get in the way of continuous fast access. However I would suggest that the technology difference is less of an issue. Another technology related issue is that SRAM arrays are generally more amenable (able to be placed in higher density) on the logic processes that modern processors use. For highest density, DRAM arrays use a semiconductor process that differs substantially from the one used for general logic inside a processor.

Processor instuctrions required wouldn't be a meaningful differentiator between shared memory and global memory access times.