3
votes

I have a CUDA kernel called update which takes two float* as a input and updates the first one. After the update, I need to update the VBO from OpenGL with the new data from the first pointer. Now I've been looking for some cuda-GL interop, but for me, all of this was really hard to understand. I'm looking for a clean and easy way to update a VBO using the data from a device pointer. I imagined something like this:

//initialize VBO
glGenBuffers(1, &vboID);
glBindBuffers(GL_ARRAY_BUFFER, vboID);
glBufferData(GL_ARRAY_BUFFER, sizeof(float)*SIZE, (void*)0, GL_STREAM_DRAW);
cudaMalloc((void**)&positions, sizeof(float)*SIZE);


//per frame code
glBindBuffer(GL_ARRAY_BUFFER, vboID);
update<<<SIZE/TPB, TPB>>>(positions, velocities);
//somehow transfer the data from the positions pointer to the VBO
glBindBuffer(GL_ARRAY_BUFFER, 0);
1
The GL-CUDA interop is the clean and simple way to do what you want. What don't you understand exactly?talonmies
The only thing I don't understand is which commands I have to use to store the data from the device pointer to the VBO. I don't get what the map- and unmap-command are doing and how I have to use themDynamitos
You don't have to use any commands. When you map a resource the device pointer is the GL resourcetalonmies
The whole point of CUDA interop is that you don't "transfer the data from the positions pointer to the VBO". You instead give CUDA the buffer object(s) you want to write to, instead of allocating CUDA memory.Nicol Bolas
You're going about it wrong. If you want to make efficient use of cuda/opengl interop, you want to start with a pointer that is provided by OpenGL. You register that with CUDA, then map it, then do CUDA operations on it. Then you can "unmap" it when you want to return it to OpenGL for further processing. This presentation (starting around slide 35, but you might want to study all) gives a completely worked tutorial for using an OpenGL Vertex Buffer with CUDA.Robert Crovella

1 Answers

5
votes

The basic idea with CUDA/OpenGL interop is that you will create a resource (e.g. VBO, PBO, etc.) using OpenGL. Using OpenGL you will allocate a buffer for that resource. Using CUDA/OpenGL interop you will register that resource with CUDA. Before using that resource in CUDA, you map the resource, to obtain a pointer to the underlying allocation that is usable by CUDA.

You then operate on that allocation using CUDA, and you can "return" that resource to OpenGL (for further processing, display, etc.) by unmapping the resource.

In the case of an OpenGL VBO, the API sequence might look like this:

// create allocation/pointer using OpenGL
GLuint vertexArray;
glGenBuffers( 1,&vertexArray);
glBindBuffer( GL_ARRAY_BUFFER, vertexArray);
glBufferData( GL_ARRAY_BUFFER, numVertices * 16, NULL, GL_DYNAMIC_COPY );
cudaGLRegisterBufferObject( vertexArray );

void * vertexPointer;
// Map the buffer to CUDA
cudaGLMapBufferObject(&ptr, vertexBuffer);
// Run a kernel to create/manipulate the data
MakeVerticiesKernel<<<gridSz,blockSz>>>(ptr,numVerticies);
// Unmap the buffer
cudaGLUnmapbufferObject(vertexBuffer);

// Bind the Buffer
glBindBuffer( GL_ARRAY_BUFFER, vertexBuffer );
// Enable Vertex and Color arrays
glEnableClientState( GL_VERTEX_ARRAY );
glEnableClientState( GL_COLOR_ARRAY );
// Set the pointers to the vertices and colors
glVertexPointer(3,GL_FLOAT,16,0);
glColorPointer(4,GL_UNSIGNED_BYTE,16,12);

glDrawArrays(GL_POINTS,0, numVerticies);
SwapBuffer();

This presentation (eg. starting at slide 36) outlines the general sequence.

The simpleGL cuda sample code gives a fully worked example.