Is there a way in CUDA to allocate memory dynamically in device-side functions ? I could not find any examples of doing this.
From the CUDA C Programming manual:
B.15 Dynamic Global Memory Allocation
void* malloc(size_t size); void free(void* ptr);
allocate and free memory dynamically from a fixed-size heap in global memory.
The CUDA in-kernel
malloc()
function allocates at least size bytes from the device heap and returns a pointer to the allocated memory or NULL if insufficient memory exists to fulfill the request. The returned pointer is guaranteed to be aligned to a 16-byte boundary.The CUDA in-kernel
free()
function deallocates the memory pointed to byptr
, which must have been returned by a previous call tomalloc()
. Ifptr
isNULL
, the call to free() is ignored. Repeated calls to free() with the same ptr has undefined behavior.The memory allocated by a given CUDA thread via
malloc()
remains allocated for the lifetime of the CUDA context, or until it is explicitly released by a call tofree()
. It can be used by any other CUDA threads even from subsequent kernel launches. Any CUDA thread may free memory allocated by another thread, but care should be taken to ensure that the same pointer is not freed more than once.