Surface memory is the write-only analogue to the texture cache in CUDA.
I've found NVIDIA GPU peak bandwidth numbers in academic literature for reading from global memory and shared memory. However, I've found less information on the write throughput of CUDA memory devices.
In particular, I'm interested in the bandwidth (and latency too, if known) of the CUDA surface memory on Fermi and Kepler GPUs.
- Are there benchmarking numbers on this?
- If not, then how might I implement a benchmark for to measure the bandwidth of writing to surface memory?