cudaMemcpy device to distant host

Question

I am working on a simulation which is running on a host and use the GPU for the computation. Once the computation is done, the host copy the memory from the device to itself and then send the computed data to a distant host.

Basically the data will do : GPU -> HOST -> NETWORK CARD

Since the simulation is in real time, time is very important, and I would like to have something like that : GPU -> NETWORKCARD, in order to reduce the delay of data transfer.

Is it possible? If no, is it something that we might see someday?

Edit : Distant host => CPU

talonmies talonmies · Accepted Answer · 2013-04-10T16:54:08

Yes, this is possible in CUDA 4.0 and later using the GPUDirect facility on platforms which support unified direct addressing (which I think is basically linux with Fermi or Kepler Telsa cards at this stage). You haven't said much about what you mean by "distant host", but if you have a network where MPI is feasible, there is probably a ready solution for you to use.

At least mvapich2 already has support for GPU-GPU transfers using either Infiniband or TCP/IP, including RDMA directly to the Infiniband adapter over the PCI express bus. Other MPI implementations probably also have support by now, although I haven't look too closely at it recently to know for sure.

cudaMemcpy device to distant host

1 Answers