I am generating a mesh from volumetric data using Marching Cubes algorithm running on CUDA.
I have tried saving the mesh and rendering it in 3 ways.
- save a crude set of triangles as a continuous array of vertex data. I estimate the size if the first pass, create an OpenGL VBO, map it to CUDA and write the vertex data to it in the format below
V0x, V0y, V0z, N0x, N0y, N0z, V1x, V1y, V1z, N1x, N1y, N1z, ...
and draw it using glDrawArrays()
.
Redundant Vertices in VBO, Redundant Vertices per Cube, No Indices.
- Take the mesh from step 1, use
thrust::sort()
andthrust::unique()
to remove redundant vertices, compute indices usingthrust::lower_bound()
. save results to an OpenGL VBO/IBO mapped to CUDA. draw the model usingglDrawElements()
.
No Redundant Vertices in VBO, Generated Indices.
- Generate a unique list of vertices per cube, store them in VBO along with their indices forming triangles in the IBO. Render using
glDrawElements()
.
Redundant Vertices in VBO, Unique Vertices per Cube, Generated Indices per Cube
Now The FPS I get for the same dataset at same ISO-Value ` is
Method 1 : 92 FPS, 30,647,016 Verts, 0 Indices
Method 2 : 122 FPS, 6,578,066 Verts, 30,647,016 Indices
Method 3 : 140 FPS, 20,349,880 Verts, 30,647,016 Indices
Even though Method 2 yields the least number of vertices, the FPS is low. I believe this is because indices are in an order that minimizes GPU cache usage. The Indices order for Method 3 gets higher GPU cache usage hence the higher FPS.
How to modify/amend method 2 to yield higher FPS?