NVCC rejects my use of malloc() in kernel code

Question

I'm running CentOS release 5.9 (Final) with CUDA, having a Tesla card with major version 1 and minor version 3. The following is my kernel code:

__global__ void foo(int* pos, int t)
{
    int index = blockDim.x * blockIdx.x + threadIdx.x; 
    t = pos [index + 1] - pos [index];  
    char* temp = (char*)malloc(t);
}

I want to allocate t bytes dynamically.

This gives me the error:

calling a host function("malloc") from a __device__/__global__ function("foo") is not allowed.

What can I do to solve this problem?

Please, provide an example which actually produces mentioned error and how are you compiling it. The code from your question won't compile, but due to unbalanced parentheses. Also, devices with CC 1.3 do not support memory allocation from within kernel; you need at least CC 2.0. Also, to allocate memory from kernel you should call malloc, not cudaMalloc (though, according to your error message, this is what you are doing in your actual code). — aland
thank you for your guidance so,can you suggest me any another solution for the same problem. — Sushant Yelpale

talonmies talonmies · Accepted Answer · 2013-05-31T10:06:58

Because you are using a compute 1.3 device, kernel malloc and the C++ new operator are not supported (this is clearly explained in the CUDA C programming guide).

Your only alternatives are either to pre-allocate a scratch global memory area using host side memory allocation (which would need to be least the largest value of t * the number of threads launched on the GPU). This memory could either be passed as a command argument, or written onto a constant memory pointer which the kernel can read. You could also declare temp as a statically sized local memory array in the kernel. If it turns out you need a relative small and a priori known values of max(t), then there may be performance benefits in templating the kernel pass max(t) as a template parameter.

NVCC rejects my use of malloc() in kernel code

1 Answers