Within CUDA kernel, copying from device to host without pinned memory?

Question

Is there a way to copy from device to host within the kernel?

Something like the following code:

__global__ void kernel(int n, double *devA, double *hostA) {
    double x = 1.0;

    do_computation();

    cudaMemcpy(hostA, &x, sizeof(double), cudaMemcpyDeviceToHost);

    do_computation();

    cudaMemcpy(hostA, devA, sizeof(double), cudaMemcpyDeviceToHost);
}

Is it possible? Based on the CUDA documentation, the cudaMemcpy is not callable from the device, right?

NOTE: I don't want to use the pinned memory. It is low performance since I will constantly check the host variable (memory). So, using pinned memory will issue a page-fault (at best for post-Pascal) that will definitely happen! If both host and device access the same location, it will basically be a ping-pong effect!

talonmies talonmies · Accepted Answer · 2018-02-26T05:36:04

Is it possible?

In one word, no.

Based on the CUDA documentation, the cudaMemcpy is not callable from the device, right?

In fact, if you do read the documentation, you will see that cudaMemcpy is supported in device code, but only for device to device transfers and not using local variables as source or destination.

Within CUDA kernel, copying from device to host without pinned memory?

1 Answers