The data rate of cudaMemcpy
operations is heavily influenced by the number of PCI-e 3.0 (or 2.0) lanes that are allocated to run from the CPU to GPU. I'm curious about how PCI-e lanes are used on Nvidia devices containing two GPUs.
Nvidia has a few products that have two GPUs on a single PCI-e device. For example:
- The GTX 590 contains two Fermi GF110 GPUs
- The GTX 690 contains two Kepler GK104 GPUs
As with many newer graphics cards, these devices mount in PCI-e 16
slots. For cards that contain only one GPU, the GPU can use 16 PCI-e lanes.
If I have a device containing two GPUs (like the GTX 690), but I'm only running compute jobs on just one of the GPUs, can all 16 PCI-e lanes serve the one GPU that is being utilized?
To show this as ascii art...
[ GTX690 (2x GF110) ] ------16 PCI-e lanes ----- [ CPU ]
I'm not talking about the case where the CPU is connected to two cards that have one GPU each. (like the following diagram)
[ GTX670 (1x GK104) ] ------ PCI-e lanes ----- [ CPU ] ------ PCI-e lanes ----- [ GTX670 (1x GK104) ]