free memory allocated on device from host

Question

is it valid to free device allocated memory from a host function? I'm writing some C++ class that should be used on host and device. My constructor and destructor are something like this:

class myClass {
public:
__host__ __device__ myClass() {
#if defined(__CUDA_ARCH__)
  data = (char*)malloc(DATA_SIZE);
#else
  cudaMalloc(&data,DATA_SIZE);
#endif
}

__host__ __device__ ~myClass() {
#if defined(__CUDA_ARCH__)
  free(data);
#else
  cudaFree(data);
#endif
}

private:
  char* data;
}

The above code compiles and i didn't get an error if i construct a class on the device and free it on the host. But this case is not documented in the CUDA developer papers.

I think you have your cases backwards. If __CUDA_ARCH__ is defined, wouldn't you then want to use the cuda functions? — Jonathan Grynspan
If __CUDA_ARCH__ is defined, the code is compiled for the device. On the device, i have to use malloc and free. Only on the host, there is need to use cudaMalloc and cudaFree — Thomas Berger
As long as malloc and cudaMalloc and free and cudaFree are operating on the same heap, I would think it would be ok. — grieve
what makes you think that you've created a class on the device and then freed it on the host? The CUDA C programming guide v4.2 states: "Memory allocated via malloc() cannot be freed using the runtime (i.e. by calling any of the free memory functions from Sections 3.2.2)." in section B.17, which is being made in reference to device-malloc'ed memory (on the device heap). — Robert Crovella
also, if you think you're not getting an error on a particular cudaFree call, you should probably be explicity checking for cuda errors after that cudaFree call. Maybe you are, but I don't see it in the sample code you posted. — Robert Crovella

Robert Crovella Robert Crovella · Accepted Answer · 2012-09-29T23:53:42

For both CUDA 4.2 and CUDA 5.0 RC, the CUDA C Programmer's guide mentions in section B.17 that: "Memory allocated via malloc() cannot be freed using the runtime (i.e., by calling any of the free memory functions from Device Memory). " (This particular text taken from the CUDA 5.0 RC document. In the original doc, Device Memory is a hyperlink to section 3.2.2) The CUDA 4.2 document can be found here (it has similar wording.) I'm wondering if: 1.) Things are actually happening the way you think. Depending on how you allocate the variable in the kernel code, I think it could go out of scope at the completion of the kernel, which would implicitly call your (device-side) destructor. 2.) a call to cudaFree with an invalid pointer (perhaps because it is a device pointer, or perhaps because it has already been freed) simply gets ignored. Without your code to play with, it's all just speculation. But if you're checking for errors and not getting one, then it may be getting ignored.

free memory allocated on device from host

2 Answers