My algorithm (parallel multi-frontal Gaussian elimination) needs to dynamically allocate memory (tree building) inside CUDA kernel. Does anyone know if gpuocelot supports such things?
According to this: stackoverflow-link and CUDA programming guide I can do such things. But with gpuocelot I get errors during runtime.
Errors:
- When I call
malloc()
inside kernel I get this error:(2.000239) ExternalFunctionSet.cpp:371: Assertion message: LLVM required to call external host functions from PTX. solver: ocelot/ir/implementation/ExternalFunctionSet.cpp:371: void ir::ExternalFunctionSet::ExternalFunction::call(void*, const ir::PTXKernel::Prototype&): Assertion false' failed.
- When I try to get or set malloc heap size (inside host code):
solver: ocelot/cuda/implementation/CudaRuntimeInterface.cpp:811: virtual cudaError_t cuda::CudaRuntimeInterface::cudaDeviceGetLimit(size_t*, cudaLimit): Assertion `0 && "unimplemented"' failed.
Maybe I have to point (somehow) to compiler that I want to use device malloc()
?
Any advice?
malloc
,free
andprintf
support, but I am not so certain about the LLVM backend. You should really ask this on the Ocelot mailing list. It isn't really a CUDA question at all and I am tempted to remove the CUDA tag. – talonmies