2
votes

I have the following scenario:

I have custom FBO with texture as color attachment.I render my stuff into that FBO.Next step I need to share that texture with CUDA and then run on it some post-processing kernel.Afterwards the texture should be bound back to a full screen quad and rendered to default Frame Buffer. I have read several OpenGL / CUDA interop tutorials and some of the steps of doing this are not completely clear to me.

First ,what I see they usually do is to read data from GL texture X ,process it in CUDA ,and then using PBO fill texture Y with resulting data.

Another thing I noticed (correct me if I am wrong) is that the OpenGL in those demos use PBO bound by default and that means that the first pass rendering results are stored into it? I am really unsure about it as all those demos use fixed OpenGL and I see no place where PBO is bound when the initial geometry pass is rendered.

So back to my case: My final question is-can I operate directly on OpenGL texture in CUDA without using PBO so that I can modify it in CUDA kernel? If no , then does it mean I have to pack FBO texture into PBO before passing it to CUDA stage?

UPDATE:

Filling PBO from Frame buffer is usually done using glReadPixels(), which means it is downloaded to CPU.That is something I want to prevent. - THAT WAS WRONG ASSUMPTION. So based on the fact I can fill PBO with pixels from texture is the following way to go? : Fill PBO with data from texture.

Map it to CUDA buffer resource.

Do changes to the data with a kernel.

Update target texture from modified PBO.

Use the updated texture in OpenGL .

2

2 Answers

1
votes

Here is an example of processing an "OpenGL Texture" in CUDA and then immediately using it in OpenGL with no additional overhead:

https://github.com/nvpro-samples/gl_cuda_interop_pingpong_st

1
votes

Filling PBO from Frame buffer is usually done using glReadPixels(), which means it is downloaded to CPU.That is something I want to prevent.

Wrong!

glReadPixels into a PBO is carried out completely on the GPU and does not do a roundtrip to system memory.

Update:

There are several constraints on CUDA-graphics interop. For example you can not map a graphics resource to CUDA memory while its being bound in the graphics context; specifically you can map it, but any access to it yields undefined results. Hence the usual strategy employs a proxy object.

So using a PBO as intermediary from OpenGL to CUDA is a worthwile method if the OpenGL resource in question can not be unbound for whatever reason. However in the case of FBOs it doesn't really matter, as whatever is bound to a FBO can not be used as a data source as long as it's bound. Due to that limitation there are normally several instances of target objects (renderbuffer or textures) present, used round robin in a multibuffer fashion.

So either you copy it, or you unbind it before mapping it in CUDA. With multiple buffers the latter is the preferred method.

When using CUDA textures you should always write to a different texture as you're reading from (in the case of textures you'd have to bind them to CUDA surfaces).