How to create a Cuda module without a host compiler

Question

I would like to create a Cuda module for use in the Cuda Driver API without interacting with the host compiler. The main impetus for this is that the decisions in my group on when to change versions of host compilers and cuda compilers is not always within our control. I would like to guard against cases where an upgrade on one side results in incompatibilities between the host and cuda compilers.

For example, I have a file, test.cu that contains only cuda device code. I would like to compile it into ptx:

nvcc --ptx kernel.cu

and then subsequently load this into my executing program like this:

cuModuleLoad(&module, "kernel.ptx");

When I try to compile the cuda file, I get the following error:

In file included from /usr/local/cuda/bin/../include/cuda_runtime.h:59:0,
             from <command-line>:0:
/usr/local/cuda/bin/../include/host_config.h:82:2: 
error: #error -- unsupported GNU version! gcc 4.5 and up are not supported!

Since I didn't include cuda_runtime.h in my code, I compiled in verbose mode to see what was going on and saw that the first step is to use my host compiler and force inclusion of this file:

> nvcc --verbose --ptx kernel.cu
#$ gcc -E -x c++ -D__CUDACC__ -C  "-I/usr/local/cuda/bin/../include"
"-I/usr/local/cuda/bin/../include/cudart"   -include "cuda_runtime.h"
-m64 -o "/tmp/tmpxft_00001058_00000000-4_kernel.cpp4.ii" "kernel.cu"

Since I know my .cu file has no host code, I would like to just force nvcc to skip the host integration steps, but I can't find a way to do that. Does anyone know if/how this can be done?

nvcc isn't a compiler and it requires the host preprocessor and compiler to compile device code. There isn't a way around it AFAIK — talonmies
Yes, nvcc isn't a compiler, but it does have the capability, once the code is split into host and device sections, to route the device code into a device-only compilation trajectory (leading to the ptx file I want). I guess the question really is can I tell nvcc that I have already split the code and there is no host code that needs to be compiled. — Blake Nelson
Is there a reason why you can't get a host compiler, create the ptx file and throw everything else away? — Christian Sarofeen
@BlakeNelson: Have a look at this diagram. Within the device code trajectory (after "splitting" in your terminology), there are successive calls to the CUDA and host preprocessors before the device compiler is called. Those preprocessor steps are including the device standard library overloads and performing macro expansions in device code. I don't think there is a way around it. — talonmies

Christian Sarofeen Christian Sarofeen · Accepted Answer · 2015-05-22T20:18:55

It doesn't look like there is a method to do as you'd like. I would compile with `nvcc --keep --ptx code.cu" and go through the compilation step by step. Doing this I could not see evidence that what you'd like to do is possible using nvcc.

How to create a Cuda module without a host compiler

1 Answers