Linking error while separating CUDA function into declaration and definition

Question

I'm following instructions provided on NVidia blog post on how to separate functions called from a kernel into declaration and definition. Using CUDA 10 version and Visual Studio compiler produces linking errors. To nvcc compiler's options I've added -dc, as instructed in the referenced post. The files are all located in the same folder under the same project.

test.cuh

__host__ __device__ float test(float, float);

test.cu

#include "test.cuh"    
__host__ __device__ float test(float a, float b)
{
    return a + b;
}

kernel.cu

#include <stdio.h>
#include "test.cuh"
__global__ void addKernel(int *c, const int *a, const int *b)
{
    int i = threadIdx.x;
    c[i] =  test(a[i], b[i]);
}

Linking error

1>kernel.cu.obj : error LNK2019: unresolved external symbol __cudaRegisterLinkedBinary_41_tmpxft_0000796c_00000000_7_kernel_cpp1_ii_f853efa9 referenced in function "void __cdecl __nv_cudaEntityRegisterCallback(void * *)" (?__nv_cudaEntityRegisterCallback@@YAXPEAPEAX@Z)
1>test.cuh.obj : error LNK2019: unresolved external symbol __cudaRegisterLinkedBinary_39_tmpxft_00006d84_00000000_7_test_cpp1_ii_f2c23be0 referenced in function "void __cdecl __nv_cudaEntityRegisterCallback(void * *)" (?__nv_cudaEntityRegisterCallback@@YAXPEAPEAX@Z)
1>test.cu.obj : error LNK2019: unresolved external symbol __cudaRegisterLinkedBinary_39_tmpxft_00008044_00000000_7_test_cpp1_ii_f2c23be0 referenced in function "void __cdecl __nv_cudaEntityRegisterCallback(void * *)" (?__nv_cudaEntityRegisterCallback@@YAXPEAPEAX@Z)
1>D:\Workspaces\src\sandbox\cuda_dc\x64\Debug\cuda_dc.exe : fatal error LNK1120: 3 unresolved externals

It doesn't make any difference if change file extensions to ".c", ".cpp", or ".cuh"

you don't add -dc to the compile options in windows. You start by selecting a relocatable device code project. The referenced post is for linux, not windows, and the process to build a relocatable device code project involves more steps than just adding -dc (even on linux, if the host linker is used for the final link phase). — Robert Crovella
Also, why are you trying to use C linkage for device functions? That makes no sense, even if the toolchain (accidentally) supports it. — talonmies
Removed "C" linkage. Still the same problem. In VS I activated parameter -rdc=true, but still no luck. — ryzhiy
If you have added test.cuh as a compilation target in your VS project, that is almost certainly incorrect, although I don't believe it is the source of these errors. — Robert Crovella

Robert Crovella Robert Crovella · Accepted Answer · 2019-01-18T17:55:24

These are the steps I followed, using the code you have shown, plus adding a simple main() function so we can have a complete project.

(In Visual Studio)

File..New..Project
on left hand side, scroll down to NVIDIA and select it
select CUDA X.Y runtime project, give project a name, click OK
at the top menu bar, next to Debug, change x86 to x64

The project should have a default file in it, kernel.cu. Replace the contents of this with (modifying your kernel.cu to add a main function):

#include <stdio.h>
#include "test.cuh"
__global__ void addKernel(int *c, const int *a, const int *b)
{
    int i = threadIdx.x;
    c[i] = test(a[i], b[i]);
}
int main() {
    int *c = NULL;
    int *a = NULL;
    int *b = NULL;
    addKernel << <1, 1 >> > (c, a, b);
}

(in windows, e.g. using file manager)

In the project folder where kernel.cu is located, place your files test.cuh and test.cu (the updated versions you posted without C linkage)

(in visual studio)

Go to the project in the solution explorer windows, right click on the project name, and select Properties
On the left hand side of the dialog, select "CUDA C/C++"
On the right hand side, change the drop-down next to "Generate Relocatable Device Code" from No to Yes
On the left side, select "CUDA linker" and confirm that "Perform device link" is already set to yes
Select OK to close the dialog
Again, in the solution explorer window, right-click on the project name and select add... existing item
A file selection dialog should open. You should see the kernel.cu file plus the test.cu and test.cuh files you added to the folder
Select and add the test.cu file
Now select Build...Rebuild Solution

When I do those steps, I get a clean compilation with no errors:

1>------ Rebuild All started: Project: test37, Configuration: Debug x64 ------
1>
1>  c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile   -g   -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37\kernel.cu" -clean
1>CUDACOMPILE : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1>  kernel.cu
1>
1>  c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile   -g   -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o x64\Debug\test.cu.obj "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37\test.cu" -clean
1>CUDACOMPILE : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1>  test.cu
1>  Compiling CUDA source file kernel.cu...
1>  Compiling CUDA source file test.cu...
1>
1>  c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2015 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g   -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37\kernel.cu"
1>
1>  c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2015 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g   -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o x64\Debug\test.cu.obj "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37\test.cu"
1>CUDACOMPILE : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1>  kernel.cu
1>CUDACOMPILE : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1>  test.cu
1>
1>  c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -dlink -o x64\Debug\test37.device-link.obj -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64" cudart.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib  -gencode=arch=compute_20,code=sm_20 -G --machine 64 x64\Debug\kernel.cu.obj x64\Debug\test.cu.obj
1>CUDALINK : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1>  cudart.lib
1>  kernel32.lib
1>  user32.lib
1>  gdi32.lib
1>  winspool.lib
1>  comdlg32.lib
1>  advapi32.lib
1>  shell32.lib
1>  ole32.lib
1>  oleaut32.lib
1>  uuid.lib
1>  odbc32.lib
1>  odbccp32.lib
1>  kernel.cu.obj
1>  test.cu.obj
1>  test37.vcxproj -> c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\x64\Debug\test37.exe
1>  test37.vcxproj -> c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\x64\Debug\test37.pdb (Full PDB)
1>  copy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\cudart*.dll" "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\x64\Debug\"
1>  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\cudart32_80.dll
1>  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\cudart64_80.dll
1>          2 file(s) copied.
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========

My opinion is that if you can follow the above steps exactly, starting with a new project and using the files I indicate, and you get the same results I do, then the problem you are describing in your question relates to something you haven't shown or haven't described. You should then provide a MCVE, and be sure to provide the same level of specifics I have given in my answer. Every step used to create, build, and compile the project, along with the console build output and all files used.

I've used CUDA 8 and Visual Studio 2015, but I don't think there should be substantial differences for what I am describing here with a newer VS and newer CUDA versions.

Linking error while separating CUDA function into declaration and definition

1 Answers