CUDA: How to link a specific obj, ptx, cubin from a separate compilation?

Question

I have a fairly large CUDA/C++ project that compiles to a static library. The toolchain is CUDA Toolkit 9.0/9.2 and VS 2017. I cannot change the company toolchain. Our most expensive kernel was hit by a nvcc compiler regression introduced in the 9.0 Toolkit. I have filed this with the Nvidia developer's website, and received confirmation of the regression. That was about a year ago, and the ticket is still open. Maybe the 10.0 Toolkit will fix it.

But I cannot wait. So my plan is to compile just this one specific kernel using the 8.0 nvcc compiler and v140 (VS 2015) compiler. It is a single .hpp file with __device__ decorator for the kernel declaration, and a .cu file with the definition. The kernel does not call other kernels; it is a rather simple kernel.

From the v140 Native Tools Command Prompt, I executed:

nvcc -x cu -arch=sm_61 -dc kernel.cu

And obtained a kernel.obj file. I have read the NVCC documentation on CUDA Compiler Driver NVCC. I confess to not entirely understanding. There are several compilation phases, and I do not see which is the correct course for my case.

My question is how to link this object file into my greater static library? If someone could point me to the correct series of commands, or better yet, how to include this into the VS Project, presumably with kernel.hpp and kernel.obj, I would be most grateful.

Static libraries are just collections of object files. Object files are added to a static library, not linked. On Windows, you would use the library manager lib to do that. You may find a thread on the NVIDIA forums useful where I demonstrated the handling of a static library on both Linux and Windows with a worked example. I don't deal with GUIs, so cannot tell you how to use lib from a VS project. — njuffa

Tyson Hilmer Tyson Hilmer · Accepted Answer · 2018-08-27T08:27:13

Following Njuffa's comment above, the simplest solution is create a static library using the earlier, performant toolchain for that kernel (VS 2015 & CUDA 8.0 Tookit). Then link that library into the greater project with the later toolchain. I did so with success.

I created a CUDA 8.0 template project in VS 2015 with only the kernel source and header. The compilation target set to static library. This created a .lib file. The .lib file and header are then added to the C++ linker settings of the greater project, using VS 2017 and CUDA 9.0. All test executables using this static library pass. This is a much simpler solution than trying to recompile using an intermediate compilation format ( ptx, cubin, etc.)

Although ultimately, the real solution was to refactor the kernel to use shared memory more efficiently, negating the need for the older nvcc version.

CUDA: How to link a specific obj, ptx, cubin from a separate compilation?

1 Answers