I am testing the rendering of extremely large 3d meshes, and I am currently testing on an iPhone 5 (I also have an iPad 3).
I have here two screenshots of Instruments with a profiling run. The first one is rendering a 1.3M vertex mesh, and the second is rendering a 2.1M vertex mesh.
The blue histogram-bar at the top shows CPU load, and it can be seen that for the first mesh is hovering at around ~10% CPU load so the GPU is doing most of the heavy lifting. The mesh is very detailed and my point-light-with-specular shader makes it look quite impressive if I say so myself, as it is able to render consistently above 20 frames per second. Oh, and 4x MSAA is enabled as well!
However, once I step up to a 2 million+ vertex mesh, everything goes to crap as we see here a massive CPU bound situation, and all instruments report 1 frame per second performance.
So, it's pretty clear that somewhere between these two assets (and I will admit that they are both tremendously large meshes to be loading in under one single VBO), whether it is the vertex buffer size or the index buffer size that is over the limit, some limit is being surpassed by the 2megavertex (462K tris) mesh.
So, the question is, what is this limit, and how can I query it? It would really be very preferable if I can have some reasonable assurance that my app will function well without exhaustively testing every device.
I also see an alternative approach to this problem, which is to stick to a known good VBO size limit (I have read about 4MB being a good limit), and basically just have the CPU work a little bit harder if the mesh being rendered is monstrous. With a 100MB VBO, having it in 4MB chunks (segmenting the mesh into 25 draw calls) does not really sound that bad.
But, I'm still curious. How can I check the max size, in order to work around the CPU fallback? Could I be running into an out of memory condition, and Apple is simply applying a CPU based workaround (oh LORD have mercy, 2 million vertices in immediate mode...)?