In CUDA, does UVA depend on any hardware features?

Question

I know CUDA only got UVA (Unified Virtual Addressing) with version 4.0. But - is that only a software feature? Or does it require some kind of hardware support (on the GPU side I mean)?

Notes:

In this GTC 2011 presentation it says a Fermi-class GPU is necessary for P2P copies, but it doesn't say that's necessary for UVA itself.
Note: I know UVA is not a good idea on a 32-bit-CPU system, I don't mean that kind of hardware support.

This seems like a question about "general computing hardware and software", which would be off topic here. I would suggest enhancing it with a direct tie-in to programming, if possible. — njuffa
@njuffa: No, it isn't general, it's actually quite specific; see some of the related UVA questions to see what they're like. — einpoklum
This is a Q&A site for progamming questions. There may well be some connection between CUDA programming and whether UVA depends on the addition of particular hardware feature to GPUs, but I am not seeing it right now, and it does not seem to be spelled out in the question. — njuffa
@njuffa: This is a question about programming, with CUDA: Is a newer version of CUDA enough for me to be able to use UVA, or not. So, a programming question. (Also, if you applied this criterion seriously you would lose a large fraction of the highest-voted questions on the site, I believe.) — einpoklum
I think I am catching on. I originally understood your question as a request for information what kind of MMU/TLB/etc gizmo had been added to the GPU to support UVA. I now think you are asking "What is the minimal compute capability required to use UVA?" which is on-topic, of course. — njuffa

Gilles Gilles · Accepted Answer · 2016-02-22T07:17:24

The UVA which was introduced back in May 2011 with CUDA 4.0 requires for hardware support some Fermi-class GPUs. So, this implies compute capability 2.0 onwards.

But apparently, that's not enough since, according to slide #17 of this presentation of the new features of CUDA 4.0, it seems to be only supported in 64-bit (which makes sense since otherwise you would run out of address space pretty quick), and with TCC (Tesla Compute Cluster) when on Windows. I'm not sure if this later limitation still exists since I never ever developed on Windows.

In CUDA, does UVA depend on any hardware features?

1 Answers