What is the canonical way to check for errors using the CUDA runtime API?

Question

Looking through the answers and comments on CUDA questions, and in the CUDA tag wiki, I see it is often suggested that the return status of every API call should checked for errors. The API documentation contains functions like cudaGetLastError, cudaPeekAtLastError, and cudaGetErrorString, but what is the best way to put these together to reliably catch and report errors without requiring lots of extra code?

NVIDIA's CUDA samples contains a header, helper_cuda.h, that has macros called getLastCudaError and checkCudaErrors, which do pretty much what is described in the accepted answer. See the samples for demonstrations. Just choose to install the samples along with the toolkit and you will have it. — chappjc
@chappjc I do not think this question and answer pretends to be original, if this is what you mean, but it has the merit to have educated people using CUDA error checking. — Vitality
@JackOLantern No, that's not what I was implying. This Q&A was very helpful to me and it's certainly easier to find than some header in the SDK. I thought it was valuable to point out this is also how NVIDIA handles it and where to look for more. I'd soften the tone of my comment if I could though. :) — chappjc
Debugging tools allowing you to "approach" where the errors start have improved a great deal since 2012 on CUDA. I have not worked with GUI based debuggers but the CUDA tag wiki mentions the command line cuda-gdb. This is a VERY powerful tool as it allows you to step through actual warps and threads on the GPU itself (requires 2.0+ architecture most of the time though) — opetrenko
@bluefeet: what was the deal with the edit that you rolled back? It looked like nothing actually changed in the markdown, but it was accepted as an edit. Was there something nefarious at work? — talonmies

talonmies talonmies · Accepted Answer · 2012-12-26T09:35:49

Probably the best way to check for errors in runtime API code is to define an assert style handler function and wrapper macro like this:

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

You can then wrap each API call with the gpuErrchk macro, which will process the return status of the API call it wraps, for example:

gpuErrchk( cudaMalloc(&a_d, size*sizeof(int)) );

If there is an error in a call, a textual message describing the error and the file and line in your code where the error occurred will be emitted to stderr and the application will exit. You could conceivably modify gpuAssert to raise an exception rather than call exit() in a more sophisticated application if it were required.

A second related question is how to check for errors in kernel launches, which can't be directly wrapped in a macro call like standard runtime API calls. For kernels, something like this:

kernel<<<1,1>>>(a);
gpuErrchk( cudaPeekAtLastError() );
gpuErrchk( cudaDeviceSynchronize() );

will firstly check for invalid launch argument, then force the host to wait until the kernel stops and checks for an execution error. The synchronisation can be eliminated if you have a subsequent blocking API call like this:

kernel<<<1,1>>>(a_d);
gpuErrchk( cudaPeekAtLastError() );
gpuErrchk( cudaMemcpy(a_h, a_d, size * sizeof(int), cudaMemcpyDeviceToHost) );

in which case the cudaMemcpy call can return either errors which occurred during the kernel execution or those from the memory copy itself. This can be confusing for the beginner, and I would recommend using explicit synchronisation after a kernel launch during debugging to make it easier to understand where problems might be arising.

Note that when using CUDA Dynamic Parallelism, a very similar methodology can and should be applied to any usage of the CUDA runtime API in device kernels, as well as after any device kernel launches:

#include <assert.h>
#define cdpErrchk(ans) { cdpAssert((ans), __FILE__, __LINE__); }
__device__ void cdpAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess)
   {
      printf("GPU kernel assert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) assert(0);
   }
}

What is the canonical way to check for errors using the CUDA runtime API?

4 Answers

The C++-canonical way: Don't check for errors...use the C++ bindings which throw exceptions.