0
votes

My algorithm (parallel multi-frontal Gaussian elimination) needs to dynamically allocate memory (tree building) inside CUDA kernel. Does anyone know if gpuocelot supports such things?

According to this: stackoverflow-link and CUDA programming guide I can do such things. But with gpuocelot I get errors during runtime.

Errors:

  1. When I call malloc() inside kernel I get this error:
    (2.000239) ExternalFunctionSet.cpp:371:  Assertion message: LLVM required to call external host functions from PTX.
    solver: ocelot/ir/implementation/ExternalFunctionSet.cpp:371: void ir::ExternalFunctionSet::ExternalFunction::call(void*, const ir::PTXKernel::Prototype&): Assertion false' failed.
  2. When I try to get or set malloc heap size (inside host code):
    solver: ocelot/cuda/implementation/CudaRuntimeInterface.cpp:811: virtual cudaError_t cuda::CudaRuntimeInterface::cudaDeviceGetLimit(size_t*, cudaLimit): Assertion `0 && "unimplemented"' failed.

Maybe I have to point (somehow) to compiler that I want to use device malloc()?

Any advice?

1
I am reasonably sure the emulator has baked in malloc, free and printf support, but I am not so certain about the LLVM backend. You should really ask this on the Ocelot mailing list. It isn't really a CUDA question at all and I am tempted to remove the CUDA tag.talonmies

1 Answers

1
votes

You can find the answer in gpu ocelot mailing list:

gpuocelot mailing list link