How to Optimizing a VBO/IBO to maximize GPU cache usage

Question

I am generating a mesh from volumetric data using Marching Cubes algorithm running on CUDA.

I have tried saving the mesh and rendering it in 3 ways.

save a crude set of triangles as a continuous array of vertex data. I estimate the size if the first pass, create an OpenGL VBO, map it to CUDA and write the vertex data to it in the format below

V0x, V0y, V0z, N0x, N0y, N0z, V1x, V1y, V1z, N1x, N1y, N1z, ...

and draw it using glDrawArrays().

Redundant Vertices in VBO, Redundant Vertices per Cube, No Indices.

Take the mesh from step 1, use thrust::sort() and thrust::unique()to remove redundant vertices, compute indices using thrust::lower_bound(). save results to an OpenGL VBO/IBO mapped to CUDA. draw the model using glDrawElements().

No Redundant Vertices in VBO, Generated Indices.

Generate a unique list of vertices per cube, store them in VBO along with their indices forming triangles in the IBO. Render using glDrawElements().

Redundant Vertices in VBO, Unique Vertices per Cube, Generated Indices per Cube

Now The FPS I get for the same dataset at same ISO-Value ` is

Method 1 : 92  FPS, 30,647,016 Verts,          0 Indices
Method 2 : 122 FPS,  6,578,066 Verts, 30,647,016 Indices
Method 3 : 140 FPS, 20,349,880 Verts, 30,647,016 Indices

Even though Method 2 yields the least number of vertices, the FPS is low. I believe this is because indices are in an order that minimizes GPU cache usage. The Indices order for Method 3 gets higher GPU cache usage hence the higher FPS.

How to modify/amend method 2 to yield higher FPS?

does you FPS measurement method take into account the time it takes to remove the redundant vertices using thrust? — m.s.
@m.s. The removal is only done once. I am not looking for real-time removal method. When the iso-value is being changed, I simply render the crude mesh in Method 1. Once the change is constant, I then process the removal which take about 4 seconds. After that I just use the VBO/IBO to render the mesh. I want higher FPS for this mesh, while just rendering, no extraction or compaction is done when I measure the FPS. — Harish

Jerem Jerem · Accepted Answer · 2015-07-06T06:49:17

Two things can help:

trying to optimize data cache usage by putting the vertices roughly in the order you will draw them
trying to optimize post transform cache usage (there is an algorithm to do that here, and implementations can probably be found on the net)

How to Optimizing a VBO/IBO to maximize GPU cache usage

1 Answers