What is the difference: DRAM Throughput vs Global Memory Throughput

Question

The actual throughput achieved by a kernel is reported by CUDA profiler using four metrics:

Global memory load throughput
Global memory store throughput
DRAM read throughput
DRAM write throughput

CUDA C Best Practices Guide describes Global memory load/store throughput as the actual throughput and it says nothing specific about DRAM read/write throughput.

CUPTI Users Guide defines:

Global memory load throughput as ((128*global_load_hit) + (l2_subp0_read_requests + l2_subp1_read_requests) * 32 - (l1_cached_local_ld_misses * 128))/(gputime)
Global memory store throughput as (l2_subp0_write_requests + l2_subp1_write_requests) * 32 - (l1_cached_local_ld_misses * 128))/(gputime)
DRAM read throughput as (fb_subp0_read + fb_subp1_read) * 32 / gputime
DRAM write throughput as (fb_subp0_write + fb_subp1_write) * 32 / gputime

I understand the DRAM read/write throughput since fb_subp* counters report a number of DRAM accesses (incremented by 1 for 32 byte access) and are collected for all SMs. So it is clear for me that throughput is calculated as function of gputime and number of bytes accessed.

I do not understand the Global memory throughput definition. There is no definition of the global_load_hit and counter. I do not see why l1_cached_local_ld_misses is substracted in both cases.

Is DRAM something different than Global memory in this context?

If I want to know what is the actual throughput of my kernel should I use the DRAM or the Global memory throughput metrics?

Greg Smith Greg Smith · Accepted Answer · 2012-06-06T04:58:06

Global memory throughput is the amount of data requested by instructions from the global address space. global_load_hits is the number of L1 cache hits from global requests (cache line size is 128 bytes). The rest of the formula approximates the global throughput for accesses that miss the L1 by calculating all accesses to L2.

Global memory is a virtual memory space that can map to both device memory and system memory.

DRAM is the physical device memory (e.g. GDDR5 on the card). DRAM is accessed on L2 misses. The following virtual address spaces can be in DRAM/device memory (global, local, constant, instruction, and texture). Note that many of these memory spaces are virtual address spaces and the final data can reside in either DRAM or system memory.

What is the difference: DRAM Throughput vs Global Memory Throughput

2 Answers