Cuda Error (209): cudaLaunchKernel returned cudaErrorNoKernelImageForDevice

Question

Operating System: CentOS 7 Cuda Toolkit Version: 11.0

Nvidia Driver and GPU Info:

NVIDIA-SMI 450.51.05
Driver Version: 450.51.05
CUDA Version: 11.0
GPU: Quadro M2000M

I'm very new to cuda programming so any guidance is extremely appreciated. I have a very simple cuda c++ program that computes the sum of two arrays in unified memory on the GPU. However, it appears that the kernel fails to launch due to a cudaErrorNoKernelImageForDevice error. The code is below:

using namespace std;
#include <iostream>
#include <math.h>
#include <cuda_runtime_api.h>
__global__
void add(int n, float *x, float*y){
for (int i = 0; i < n; i++)
y[i] = x[i] + y[i];
}

int main() {
cout << "!!!Hello World!!!" << endl; // prints !!!Hello World!!!

int N = 1<<20;
float *x, *y;

cudaMallocManaged((void**)&x, N*sizeof(float));
cudaMallocManaged((void**)&y, N*sizeof(float));

for(int i = 0; i < N; i++){
x[i] = 1.0f;
y[i] = 2.0f;
}

add<<<1, 1>>>(N, x, y);
cudaGetLastError();
    /**
     * This indicates that there is no kernel image available that is suitable
     * for the device. This can occur when a user specifies code generation
     * options for a particular CUDA source file that do not include the
     * corresponding device configuration.
     *
     *    cudaErrorNoKernelImageForDevice       =     209,
     */

cudaDeviceSynchronize();

float maxError = 0.0f;
for (int i = 0; i < N; i++){
maxError = fmax(maxError, fabs(y[i]-3.0f));
}

cudaFree(x);
cudaFree(y);

return 0;


}

This is a problem with how you compiled the code. How did you compile the code (i.e. what command did you use to compile it, exactly?) Your Quadro M2000M is a maxwell device, with compute capability 5.0, so you need to compile for the correct compute capability. Something like -arch=sm_50 in your compile command. If you have something like -arch=sm_60 that would explain why you are getting this failure. — Robert Crovella
Note that by default, CUDA 11.0 compiles for a default architecture of sm_52, so if you didn't provide any architecture switches on the command line, that would also cause this sort of problem. — Robert Crovella
@RobertCrovella I'm using Eclipse IDE and the compiler command is outputted as the following when it builds the file: /usr/local/cuda-11.0/bin/nvcc --device-debug --debug -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -ccbin g++ -c -o "src/barracuda.o" "../src/barracuda.cu" — cuda newb
So that is the problem. All those _52 in the compile command line are incorrect for your GPU. The IDE has a selection when you are setting up the project (or in the project properties) to change the architecture you are compiling for. You want _50 not _52. — Robert Crovella
@RobertCrovella Okay so given that my GPU has compute capability of 5.0, i should compile with architecture of -arch=sm_XX, where XX represents my compute capability? In this case, XX = 50? I'll give that a try. — cuda newb

Robert Crovella Robert Crovella · Accepted Answer · 2020-07-15T17:08:33

The error here comes about due to the fact that a CUDA kernel must be compiled in a way that the resulting code (PTX, or SASS) is compatible with the GPU that it is being run on. This is a topic with a lot of nuance, so please refer to questions like this (and the links there) for additional background.

The GPU architecture when we want to be precise is referred to as the compute capability. You can discover the compute capability of your GPU either with a google search or by running the deviceQuery CUDA sample code. The compute capability is expressed as (major).(minor) so something like compute capability 5.2, or 7.0, etc.

When compiling code, it's necessary to specify a compute capability (or if not, a default compute capability will be implied). If you specify the compute capability when compiling in a way that matches your GPU, everything should be fine. However newer/higher compute capability code will generally not run on older/lower compute capability GPUs. In that case, you will see errors like what you describe:

cudaErrorNoKernelImageForDevice

209

"no binary for GPU"

or similar. You may also see no explicit error at all if you are not doing proper CUDA error checking. The solution is to match the compute capability specified at compile time with the GPU you intend to run on. The method to do this will vary depending on the toolchain/IDE you are using. For basic nvcc command line usage:

nvcc -arch=sm_XY ...

will specify a compute capability of X.Y

For Eclipse/Nsight Eclipse/Nsight Visual Studio, the compute capability can be specified in the project properties. Depending on the tool it may be expressed as switch values (e.g. compute_XY, sm_XY) or it may be expressed numerically as X.Y

Cuda Error (209): cudaLaunchKernel returned cudaErrorNoKernelImageForDevice

1 Answers