4
votes

So i have a system (using OpenGL 4.x) where i am receiving a stream of points (potentially with color and/or normal), from an external source. And I need to draw these points as GL_POINTS, running custom switchable shaders for coloring (color could be procedurally generated, or come from vertex color or normal direction).

The stream consists of receiving a group of points (with or without normal or color) of an arbitrary count (typical from 1k to 70k points) at a fairly regular interval (4 to 10 hz), I need to add these points to my current points and draw all the points so far received points.

I am guaranteed that my vertex type will not change, I am told at the beginning of the streaming which to expect, so i am either using an interleaved vertex with: pos+normal+color, pos+normal, pos+color, or just pos.

My current solution is to allocate interleaved vertex VBOs (with surrounding VAOs) of the appropriate vertex type at a config file specified max vertex count (allocated with the DYNAMIC hint).

As new points come in i fill up my current non filled VBO via glBufferSubData. I keep a count (activePoints) of how many vertices the current frontier VBO has in it so far, and use glBufferSubData to fill in a range starting with activePoints, if my current update group has more vertices than can fit in my frontier buffer (since i limit the vertex count per VBO), then i allocate a new VBO and fill the range starting at 0 and ending with the number of points left in my update group (not added to the last buffer), if I still have points left I do this again and again. It is rare that an update group straddles more than 2 buffers.

When rendering i render all my VBOs (-1) with a glDrawArrays(m_DrawMode,0,numVertices), where numVertices is equal to max buffer allowed size, and my frontier buffer with a glDrawArrays(m_DrawMode,startElem,numElems) to account for it not being completely filled with valid vertices.

Of course at some point I will have more points than I can draw interactively, so i have an LRU mechanism that deallocates the oldest (according to the LRU alg) sets of VBOs as needed.

Is there a more optimal method for doing this? Buffer orphaning? Streaming hint? Map vs SubData? Something else?

The second issue is that i am now asked to removed points (at irregular intervals), ranging from 10 to 2000 at a time. But these points are irregularly spaced within the order I received them initially. I can find out what offsets in which buffers they currently exit in, but its more of a scattering than a range. I have been "removing them" by finding their offsets into the right buffers and one by one calling glBufferSubData with a range of 1 (its rare that they are beside each other in a buffer), and changing there pos to be somewhere far off where they will never be seen. Eventually i guess buffers should be deleted from these remove request adding up, but I don't currently do that.

What would be a better way to handle that?

1
Instead of physically zeroing out unused vertices, I would highly suggest you use glDrawElements (...). Then you pass it an array of all the points that actually have data. You can keep a list internally of unused or free vertices, but you are definitely wasting bus bandwidth zeroing-out the free vertices. Just leave them like they are, but acknowledge they are not to be rendered / contain junk :)Andon M. Coleman
So are you suggesting I create a delete and create a new VBO filled with indices (or IBO really), each time say 4 points out of 300k in the buffer needs to be removed? And then draw with glDrawElements, as apposed to glDrawArrays?Ryan
Not quite. I was suggesting that you allocate an IBO of equal size to your vertex buffer, initially it will be sequentially filled with values from 0-n (where n is your VBO size). When you want to remove a single vertex at a random location or a range of sequential vertices, simply use glBufferSubData (...) and replace the values at IndexArray [n] with an index that points to a valid vertex. This way instead of having to redefine vertex position, or add a 32-bit w-coordinate in order to "hide" the removed vertices at render-time, you can get away with merely re-defining 16-bit indices.Andon M. Coleman
You do not want to create or delete a new IBO for this operation. A 16-bit IBO of equal dimensionality as your vertex buffer will consume less memory than adding a w coordinate. In theory, rendering with a huge element array and glDrawElements (...) might be slower, but you'll definitely reduce the expense of the case where you remove randomly distributed vertices. And you will not increase the expense of defining (or updating) the positions of vertices because you will not necessitate an extra 32-bits per-each vertex.Andon M. Coleman

1 Answers

4
votes

Mapping may be more efficient than glBufferSubData, especially when having to "delete" points. Explicit flush may be of particular help. Also, mapping allows you to offload the filling of a buffer to another thread.
Be positively sure to get the access bits correct (or performance is abysmal), in particular do not map a region "read" if all you do is write.

Deleting points from a vertex buffer is not easily possible, as you probably know. For "few" points (e.g. 10 or 20) I would just set w = 0, which moves them to infinity and keep drawing the whole thing as before. If your clip plane is not at infinity, this will just discard them. With explicit flushing, you would not even need to keep a separate copy in memory.
For "many" points (e.g. 1,000), you may consider using glCopyBufferSubData to remove the "holes". Moving memory on the GPU is fast, and for thousands of points it's probably worth the trouble. You then need to maintain a count for every vertex buffer, so you draw fewer points after removing some.

To "delete" entire vertex buffers, you should just orphan them (and reuse). OpenGL will do the right thing on its own behalf then, and it's the most efficient way to keep drawing and reusing memory.

Using glDrawElements instead of glDrawArrays as suggested in Andon M. Coleman's comment is usually a good advice, but will not help you in this case. The reason why one would want to do that is that the post-transform cache works by tagging vertices by their index, so drawing elements takes advantage of the post-transform cache whereas drawing arrays does not. However, the post-transform cache is only useful on complex geometry such as triangle lists or triangle strips. You're drawing points, so you will not use the post-transform cache in any case -- but using indices increases memory bandwidth both on the GPU and on the PCIe bus.