I am working on a simulation which is running on a host and use the GPU for the computation. Once the computation is done, the host copy the memory from the device to itself and then send the computed data to a distant host.
Basically the data will do : GPU -> HOST -> NETWORK CARD
Since the simulation is in real time, time is very important, and I would like to have something like that : GPU -> NETWORKCARD, in order to reduce the delay of data transfer.
Is it possible? If no, is it something that we might see someday?
Edit : Distant host => CPU