I'm a newbie looking for help with linking some compiled CUDA object code to a C++ project using g++. There have been some previous questions and solutions for this posted (here and here), but none have worked for me yet and I can't seem to figure out why. Unfortunately, I'm stuck using Windows for this.
The simple example that I'm trying to get working looks like this:
// kernel.h
int cuda_vec_add(float *h_a, float *h_b, float *h_c, int n);
CUDA code adding two vectors.
// kernel.cu
#include <kernel.h>
__global__ void vec_add_kernel(float *a, float *b, float *c, int n) {
int i = threadIdx.x + blockDim.x * blockIdx.x;
if (i < n) c[i] = a[i] + b[i];
}
int cuda_vec_add(float *h_a, float *h_b, float *h_c, int n) {
float *d_a, *d_b, *d_c;
cudaMalloc(&d_a, n*sizeof(float));
cudaMalloc(&d_b, n*sizeof(float));
cudaMalloc(&d_c, n*sizeof(float));
cudaMemcpy(d_a, h_a, n*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_b, h_b, n*sizeof(float), cudaMemcpyHostToDevice);
vec_add_kernel<< <(n-1)/256+1,256>> >(d_a, d_b, d_c, n);
cudaMemcpy(h_c, d_c, n*sizeof(float), cudaMemcpyDeviceToHost);
cudaFree(d_a); cudaFree(d_b); cudaFree(d_c);
return 0;
}
And c++ code calling the CUDA function.
// main.cpp
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include <kernel.h>
using namespace std;
int main() {
const int n = 5;
float h_A[n] = { 0., 1., 2., 3., 4. };
float h_B[n] = { 5., 4., 3., 2., 1. };
float h_C[n];
cuda_vec_add(h_A, h_B, h_C, n);
printf("{ 0.0, 1.0, 2.0, 3.0, 4.0 } + { 5.0, 4.0, 3.0, 2.0, 1.0 } = { %0.01f, %0.01f, %0.01f, %0.01f, %0.01f }\n",
h_C[0], h_C[1], h_C[2], h_C[3], h_C[4]);
cin.get();
return 0;
}
I first compiled the CUDA code to "kernel.o" using nvcc:
nvcc -I. -arch=sm_30 -c kernel.cu -o kernel.o
This seems to work fine. But then when I try to link it to my C++ project:
g++ -I. -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64" main.cpp kernel.o -lcuda -lcudart
I get the following error:
Warning: corrupt .drectve at end of def file
C:\Users\Geoff\AppData\Local\Temp\cczu0qxj.o:main.cpp:(.text+0xbe):
undefined reference to `cuda_vec_add(float*, float*, float*, int)'
collect2.exe: error: ld returned 1 exit status
I'm using CUDA toolkit 7.5 with Visual Studio 2013 and gcc version 5.2.0.
So far I've tried:
Compiling everything with nvcc. This works fine except it doesn't fit the requirements of my project.
The solution posted here using the -dlink flag in nvcc. Unfortunately, this returned the same error.
Some other, less productive things.
Really sorry if this ends up being a dumb mistake, but I've been stuck on it for a while. Thanks for your help.
nm
on thekernel.o
to see how thecuda_vec_add
function looks like? Also the warning about the corruption seems weird. – Rudolfs Bundulis