I have a linux box with 2 GTX 590 cards (4 GPUs). With the CUDA 4.0 driver, I am able to invoke GPUDirect memory access and verify successful copies between ALL possible pairs of my 4 GPUs.
However, after I upgraded to CUDA 4.1 driver (or any subsequent driver), I am limited in GPUDirect access pairs.
For example peer-to-peer is enabled between the following under CUDA 4.0:
GPU0 <-> GPU1
GPU0 <-> GPU2
GPU0 <-> GPU3
GPU1 <-> GPU2
GPU1 <-> GPU3
GPU2 <-> GPU3
But under CUDA 4.1 (or later) I am limited to access between only:
GPU0 <-> GPU1 (same card)
GPU2 <-> GPU3 (same card)
GPU1 <-> GPU3
Can anyone explain this or know of a workaround when using the latest CUDA 5.x drivers?
$ lspci -tv (the interesting part) gives:
-[0000:00]-+-00.0 ATI Technologies Inc RD890 Northbridge only single slot PCI-e GFX Hydra part
+-02.0-[0c-0f]----00.0-[0d-0f]--+-00.0-[0f]--+-00.0 nVidia Corporation Device 1088
| | \-00.1 nVidia Corporation GF110 High Definition Audio Controller
| \-02.0-[0e]--+-00.0 nVidia Corporation Device 1088
| \-00.1 nVidia Corporation GF110 High Definition Audio Controller
:
+-0b.0-[04-07]----00.0-[05-07]--+-00.0-[07]--+-00.0 nVidia Corporation Device 1088
| | \-00.1 nVidia Corporation GF110 High Definition Audio Controller
| \-02.0-[06]--+-00.0 nVidia Corporation Device 1088
| \-00.1 nVidia Corporation GF110 High Definition Audio Controller
To me it looks like all paths are physically available (tree like structure), and they are when using cuda 4.0, but when using cuda 4.1 and up cudaDeviceCanAccessPeer() gives false for "cross card" communications. Note, ALL host to device paths are available always (of course).