2
votes

I have a need to stream a texture (essentially a camera feed).

With object streaming, the following scenarios seem to be arise:

  1. Is the new object's data store larger, smaller or same size as the old one?
  2. Subset of or whole texture being updated?
  3. Are we streaming a buffer object or texture object (any difference?)

Here are the following approaches I have come across:

  1. Allocate object data store (either BufferData for buffers or TexImage2D for textures) and then each frame, update subset of data with BufferSubData or TexSubImage2D

  2. Nullify/invalidate the object after the last call (eg. draw) that uses the object either with:

    • Nullify: glTexSubImage2D( ..., NULL), glBufferSubData( ..., NULL)
    • Invalidate: glBufferInvalidate(), glMapBufferRange​ with the GL_MAP_INVALIDATE_BUFFER_BIT​, glDeleteTextures ?
  3. Simpliy reinvoke BufferData or TexImage2D with the new data

  4. Manually implement object multi-buffering / buffer ping-ponging.

Most immediately, my problem scenario is: entire texture being replaced with new one of same size. How do I implement this? Will (1) implicitly synchronize ? Does (2) avoid the synchronization? Will (3) synchronize or will a new data store for the object be allocated, where our update can be uploaded without waiting for all drawing using the old object state to finish? This passage from the Red Book V4.3 makes be believe so:

Data can also be copied between buffer objects using the glCopyBufferSubData() function. Rather than assembling chunks of data in one large buffer object using glBufferSubData(), it is possible to upload the data into separate buffers using glBufferData() and then copy from those buffers into the larger buffer using glCopyBufferSubData(). Depending on the OpenGL implementation, it may be able to overlap these copies because each time you call glBufferData() on a buffer object, it invalidates whatever contents may have been there before. Therefore, OpenGL can sometimes just allocate a whole new data store for your data, even though a copy operation from the previous store has not completed yet. It will then release the old storage at a later opportunity.

But if so, why the need for (2)[nullify/invalidates]?

Also, please discuss the above approaches, and others, and their effectiveness for the various scenarios, while keeping in mind atleast the following issues:

  1. Whether implicit synchronization to object (ie. synchronizing our update with OpenGL's usage) occurs
  2. Memory usage
  3. Speed

I've read http://www.opengl.org/wiki/Buffer_Object_Streaming but it doesn't offer conclusive information.

1
There's a good chapter about async buffer uploads in "OpenGL Insights". Try getting a copy... - peppe

1 Answers

5
votes

Let me try to answer at least a few of the questions you raised.

The scenarios you talk about can have a great impact on the performance on the different approaches, especially when considering the first point about the dynamic size of the buffer. In your scenario of video streaming, the size will rarely change, so a more expensive "re-configuration" of the data structures you use might be possible. If the size changes every frame or every few frames, this is typically not feasable. However, if a resonable maximum size limit can be enforced, just using buffers/textures with the maximum size might be a good strategy. Neither with buffers nor with textures you have to use all the space there is (although there are some smaller issues when you do this with texures, like wrap modes).

3.Are we streaming a buffer object or texture object (any difference?)

Well, the only way to efficiently stream image data to or from the GL is to use pixel buffer objects (PBOs). So you always have to deal with buffer objects in the first place, no matter if vertex data, image data or whatever data is to be tranfered. The buffer is just the source for some glTex*Image() call in the texture case, and of course you'll need a texture object for that.

Let's come to your approaches:

In approach (1), you use the "Sub" variant of the update commands. In that case, (parts of or the whole) storage of the existing object is updated. This is likely to trigger an implicit synchronziation ifold data is still in use. The GL has basically only two options: wait for all operations (potentially) depending on that data to complete, or make an intermediate copy of the new data and let the client go on. Both options are not good from a performance point of view.

In approach (2), you have some misconception. The "Sub" variants of the update commands will never invalidate/orphan your buffers. The "non-sub" glBufferData() will create a completely new storage for the object, and using it with NULL as data pointer will leave that storage unintialized. Internally, the GL implementation might re-use some memory which was in use for earlier buffer storage. So if you do this scheme, there is some probablity that you effectively end up using a ring-buffer of the same memory areas if you always use the same buffer size.

The other methods for invalidation you mentiond allow you to also invalidate parts of the buffer and also a more fine-grained control of what is happening.

Approach (3) is basically the same as (2) with the glBufferData() oprhaning, but you just specify the new data directly at this stage.

Approach (4) is the one I actually would recommend, as it is the one which gives the application the most control over what is happening, without having to relies on the GL implementation's specific internal workings.

Without taking synchronization into account, the "sub" variant of the update commands is more efficient, even if the whole data storage is to be changed, not just some part. That is because the "non-sub" variants of the commands basically recreate the storage and introduce some overhead with this. With manually managing the ring buffers, you can avoid any of that overhead, and you don't have to rely in the GL to be clever, by just using the "sub" variants of the updates functions. At the same time, you can avoid implicit synchroniztion by only updating buffers which aren't in use by th GL any more. This scheme can also nicely be extenden into a multi-threaded scenario. You can have one (or several) extra threads with separate (but shared) GL contexts to fill the buffers for you, and just passing the buffer handlings to the draw thread as soon as the update is complete. You can also just map the buffers in the draw thread and let the be filled by worker threads (wihtout the need for additional GL contexts at all).

OpenGL 4.4 introduced GL_ARB_buffer_storage and with it came the GL_MAP_PERSISTEN_BIT for glMapBufferRange. That will allow you to keep all of the buffers mapped while they are used by the GL - so it allows you to avoid the overhead of mapping the buffers into the address space again and again. You then will have no implicit synchronzation at all - but you have to synchronize the operations manually. OpenGL's synchronization objects (see GL_ARB_sync) might help you with that, but the main burden on synchronization is on your applications logic itself. When streaming videos to the GL, just avoid re-using the buffer which was the source for the glTexSubImage() call immediately and try to delay its re-use as long as possible. You are of course also trading throughput for latency. If you need to minimize latency, you might to have to tweak this logic a bit.

Comparing the approaches for "memory usage" is really hard. There are a lot of of implementation specific details to consider here. A GL implementation might keep some old buffer memories around for some time to fullfill recreation requests of the same size. Also, an GL implementation might make shadow copies of any data at any time. The approaches which don't orphan and recreate storages all the time in principle expose more control of the memory which is in use.

"Speed" itself is also not a very useful metric. You basically have to balance throughput and latency here, according to the requirements of your application.