Writing a CUDA kernel to replace an equivalent CPU-only function

Question

I have some .cpp files which implement Smoothed Particle hydrodynamics, which is a particle method for modelling fluid flow.

One of the most time consuming components in these particle techniques is finding the nearest neighbours (K-nearest neighbours or Range searching ) for every particle at every time-step of the simulation.

Right now I just want to accelerate the neighbor search routine using GPU's and CUDA, replacing my current CPU based neighbour search routine. Only neighbour search will run on the GPU's while the rest of the simulation proceeds on the CPU.

My question is, how should I go about compiling the entire code? To be more specific, suppose I write the neighbour search kernel function in a file nsearch.cu.

Then should I rename all my previous .cpp files as .cu files and re-compile the whole set (along with nsearch.cu) using nvcc? For simple examples at least, nvcc cannot compile CUDA codes with extension .cpp i.e nvcc foo.cu compiles but nvcc hello.cpp doesn't.

In short, what should be the structure of this CUDA plugin and how should I go about compiling it?

I am using Ubuntu Linux 10.10, CUDA 4.0, NVIDIA GTX 570 (Compute capability 2.0) and the gcc compiler for my work

perreal perreal · Accepted Answer · 2011-11-22T19:34:22

You need to write the nsearch.cu file and compile it with "nvcc -c -o nsearch.o" and then link nsearch.o with the main application. There has to be a nsearch.h file that exports a wrapper around the actual kernel.

in nsearch.h : 
void kern();

in nsearch.cu:
void __global__ kern__() {
}
void kern() {
  kern__<<<...>>>();
}

Writing a CUDA kernel to replace an equivalent CPU-only function

2 Answers