0
votes

I'm using CUDA on a double GPU system using NVIDIA GTX 590 cards and I have an array partitioned according to the figure below.

If I'm going to use CudaSetDevice() to split the sub-arrays across the GPUs, will they share the same global memory? Could the first device access the updated data on the second device and, if so, how?

Thank you. enter image description here

1

1 Answers

1
votes

Each device memory is separate, so if you call cudaSetDevice(A) and then cudaMalloc() then you are allocating memory on device A. If you subsequently access that memory from device B then you will have a higher access latency since the access has to go through the external PCIe link.

An alternative strategy would be to partition the result across the GPUs and store all the input data needed on each GPU. This means you have some duplication of data but this is common practice in GPU (and indeed any parallel method such as MPI) programming - you'll often hear the term "halo" applied to the data regions that need to be transferred between updates.

Note that you can check whether one device can access another's memory using cudaDeviceCanAccessPeer(), in cases where you have a dual GPU card this is always true.