undefined reference to function c++ thrown by intermediate object file

Question

I was trying to run a simple CUDA program that performs matrix addition on a specific size.

Here is my code:

main.cpp

/* sample CUDA programming to prove that (AB)transpose=(B)transpose*(A)transpose */

#include "common.h"
#include "utils.h"
#include <iostream>
#include <stdlib.h>
#include <time.h>


using namespace std;


void preprocess(int *A, int *B, int *C, int **da, int **db, int **dc,int M, int N, int P,int blksize);
void checktransposeppt(int *da, int *db, int *dc);

void display(int a[], int b[])
{
    //display the matrices
}

int main()
{

    int A[M*P],B[P*N];
    int C[M*N];

    int *da;
    int *db;
    int *dc;


    //initializing values for A and B

    display(A,B);

    preprocess(A,B,C,&da,&db,&dc,M,N,P,blksize);

    checktransposeppt(da,db,dc);

    checkCudaErrors(cudaFree(da));
    checkCudaErrors(cudaFree(db));
    checkCudaErrors(cudaFree(dc));

}

and here is preprocess.cpp :- basically doing cudamalloc, cudamemcpy hosttodevice of arrays and devicetohost of resultant

#include "utils.h"

void preprocess(int *h_a, int *h_b, int *h_c,int **d_a,int **d_b,int **d_c,int M, int N, int P, int blksize)
{

    checkCudaErrors(cudaFree(0));
    checkCudaErrors(cudaMalloc(d_a,(size_t)sizeof(int)*(M*P)));
    checkCudaErrors(cudaMalloc(d_b,(size_t)sizeof(int)*(P*N)));
    checkCudaErrors(cudaMalloc(d_c,(size_t)sizeof(int)*(M*N)));
    checkCudaErrors(cudaMemset(d_c,0,(size_t)sizeof(int)*(M*N)));

    checkCudaErrors(cudaMemcpy(*d_a,h_a,(size_t)sizeof(int)*(M*P),cudaMemcpyHostToDevice));
    checkCudaErrors(cudaMemcpy(*d_b,h_b,(size_t)sizeof(int)*(P*N),cudaMemcpyHostToDevice));
    checkCudaErrors(cudaMemcpy(h_c,*d_c,(size_t)sizeof(int)*(M*N),cudaMemcpyDeviceToHost));
}

and this is common.h, a central place to define most of the external headers and global variables

#ifndef COMMON_H
#include <cuda.h>
#include <cuda_runtime.h>

#define COMMON_H

extern int M=256;
extern int P=128;
extern int N=64;
extern int blksize=16;

extern dim3 gridsize(M/blksize,N/blksize,1);
extern dim3 blocksize(blksize,blksize,1);

#endif

the kernel.cu

#include "utils.h"
#include "common.h"

__global__ void abkerneltranspose(int *d_a,int *d_b,int *d_c,int N);


    __global__
void abkerneltranspose(int *d_a,int *d_b,int *d_c,int N)
{
    int blkx=blockIdx.x;
    int blky=blockIdx.y;
    int thdx=threadIdx.x;
    int thdy=threadIdx.y;

    int row=blkx*blockDim.x+threadIdx.x;
    int col=blky*blockDim.y+threadIdx.y;

    d_c[row*N+col]=d_a[row*N+col]+d_b[row*N+col];

}

void checktransposeppt(int *d_a,int *d_b,int *d_c)
{

    dim3 gridsize(M/blksize,N/blksize,1);
    dim3 blocksize(blksize,blksize,1);

    abkerneltranspose<<<gridsize,blocksize>>>(d_a,d_b,d_c,N);
}

and here is where I suspect the culprit to be: makefile

NVCC=nvcc
NVCC_OPTS=-O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64

all: app

app: gpucompile.o cpucompile.o Makefile
    nvcc -o app  gpucompile.o cpucompile.o -L $(NVCC_OPTS)  $(GCC_OPTS)

gpucompile.o: kernel.cu
    nvcc -c kernel.cu $(NVCC_OPTS)

cpucompile.o: main.cpp preprocess.cpp 
    nvcc -x cu main.cpp preprocess.cpp -I. -I $(GCC_OPTS) -I $(CUDA_INCLUDEPATH)

clean:
    rm -f *.o hw *.bin

ok here is the problem

on make command, It compiles correctly but throws an error

/tmp/tmpxft_00002074_00000000-21_main.o: In function main': tmpxft_00002074_00000000-3_main.cudafe1.cpp:(.text+0x543): undefined reference tocheckTransposeppt(int*, int*, int*)'

I am really not sure why this occurs. I compile and create the cpp code separately (just ignore -x cu, it does not cause error) and do the same kernel.cu which I later link.

But this error is thrown by the intermediate main.o which leads me to believe that It failed in creating the cpucompile.o. But couldn't the linker wait till it gets gpucompile.o and then link the two.

Also I tried creating separate object files main.o, preprocess.o and kernel.o and link them all in one step

then I get the following additional error:

/tmp/tmpxft_00002f88_00000000-16_main.o: In function main': tmpxft_00002f88_00000000-3_main.cudafe1.cpp:(.text+0x532): undefined reference topreprocess(int*, int*, int*, int**, int**, int**, int, int, int, int)'

I missed something basic, can someone please explain what is going wrong here?

Also what is the best practice for doing a project like this: I mean I separate compile device code and cpu code and then link them. I also have a common header where I define the external headers and global variables/classes/function definitions. Any suggestions?

Robert Crovella Robert Crovella · Accepted Answer · 2015-07-25T16:50:47

Yes, your makefile is not correct.

The application target you want to build is app, and the makefile target for that is set up in a possibly workable fashion.

The app target requires gpucompile.o and cpucompile.o objects.

You have specified a target for each required object.

The gpucompile.o target is set up in a possibly workable fashion. There is still a problem in that it is by default creating kernel.o, not gpucompile.o

The cpucompile.o target is not workable. It is broken in several ways. First, it appears to be copied from a makefile target that includes a link phase, but this is not what we want - you're creating an unlinked object (cpucompile.o) at this point. Furthermore, we don't normally build two separate unlinked objects (main.o and preprocess.o) into a single unlinked object.

In general I would recommend switching to a makefile format that simply treats .cu files and .cpp files in a similar fashion - create a target for each, and build each into an object. Then link all the objects together to create the executable. There is no need to try to create a separate "gpu object" that includes all the GPU code, and separate "cpu object" that includes all the CPU code, and then link them together.

You have a separate issue in your code where you are defining M and other variables in common.h and then including that in multiple files. This will result in a multiple definition link error. There are various ways to fix this. One possible approach is to modify your commons.h file like this:

#ifndef COMMON_H
#include <cuda.h>
#include <cuda_runtime.h>

#define COMMON_H

extern int M;
extern int P;
extern int N;
extern int blksize;

extern dim3 gridsize;
extern dim3 blocksize;

#endif

then add the following initializations to the top of one of your files, such as main.cpp:

int M=256;
int P=128;
int N=64;
int blksize=16;

dim3 gridsize(M/blksize,N/blksize,1);
dim3 blocksize(blksize,blksize,1);

With those changes, and using a makefile like this:

NVCC=nvcc -O3 -arch=sm_20 -Xcompiler -Wall -Xcompiler -Wextra -m64

all: app

app: kernel.o main.o preprocess.o Makefile
        $(NVCC) -o app kernel.o main.o preprocess.o

kernel.o: kernel.cu
        $(NVCC) -c kernel.cu

main.o: main.cpp
        $(NVCC) -x cu -c main.cpp

preprocess.o: preprocess.cpp
        $(NVCC) -x cu -c preprocess.cpp
clean:
        rm -f *.o app

I was able to build your code with the following caveats:

You didn't provide a utils.h So I created one.
There are still various compile/link warnings. These are due to your code, not the construction of the makefile.

undefined reference to function c++ thrown by intermediate object file

1 Answers