0
votes

I'm drawing a small number of fairly large meshes (terrain from srtm data in fact) in an android app, and I would like it to go faster (or run acceptably on slower devices). Currently I'm using 1 buffer of floats for vertex data and 1 buffer of bytes for normals.

Would combining the 2 buffers into 1 improve performance? I've found a few posts on here and in blogs claiming that a single buffer will be better, but with little hard evidence.

The test case uses 6 separate meshes, each with 65k vertices and 128k triangles. (I am using drawelements as each vertex is used in up to 8 triangles).

The colors (so far) are calculated from the point heights in the vertex shader, so I don't need to pass in color info as attribute data.

This is all Java code running in a standard android VM

fragment shader is unity

void main() {
  gl_FragColor = v_Colour;
}

vertex shader is:

uniform mat4 u_MVPMatrix;
uniform mat4 u_MVMatrix;
uniform float u_many[8];

const int inflightx = 0; //vector to light in u_many
const int inflighty = 1;
const int inflightz = 2;

const int ambfactor = 3;

const vec3 basecolour1 = vec3(28.0 / 255.0, 188.0 / 255.0, 108.0 / 255.0);
const vec3 basecolour2 = vec3(150.0 / 255.0, 75.0 / 255.0, 0.0);
const vec3 basecolour3 = vec3(0.85, 0.85, 0.85);

attribute vec4 a_vpos;
attribute vec3 a_vertexnormal;

varying vec4 v_Colour;

void main() {
  vec3 eyenormal = vec3(u_MVMatrix *vec4(a_vertexnormal, 0.0));
  eyenormal = eyenormal / length(eyenormal);

  vec3 basecolour;

  if (a_vpos.z < 100.0) {
    basecolour = basecolour1 + ((basecolour2-basecolour1) * a_vpos.z / 100.0);
  } else {
    basecolour = basecolour2 + ((basecolour3-basecolour2) * (a_vpos.z - 100.0) / 500.0);
  }

  v_Colour = vec4(((dot(eyenormal, vec3(u_many[inflightx],u_many[inflighty],u_many[inflightz])) + u_many[ambfactor]) * basecolour),1.0);

  gl_Position = u_MVPMatrix * a_vpos;

}
2

2 Answers

1
votes

Storing each vertex attribute in a separate buffer is called a structure of arrays. Storing each vertex attribute consecutively in the same buffer is called an array of structures.

Theory says that storing vertices in an array of structures is the most efficient way, because the vertex shader can read every attribute of a given vertex to process sequencially. Using structure of arrays is advised only when your program needs to modify some of the attributes at each frame (but not all of them, for instance the vertex position but not the normal), allowing to update only the data that requires to without rewriting the whole vertex buffer.

But this is theory, in your case you probably don't have enough data to process to actually see a difference.

Also, in case you still want to use two buffers, I would advice to keep a 4-bytes alignment for the normal buffer (if you can afford the memory), performance might be affected if your vertex size is not a multiple of 4.

0
votes

I've tested both methods and there is no significant difference in performance on the 2 devices I've tried - Galaxy S2 and Nexus 7. Here's the detailed info:

dual buffers: vertex data as triples of floats, normal data as triples of bytes, tightly packed.

single buffer: stride 16 bytes, triple of floats, triple of bytes, 1 byte padding.

Result info: I've timed this at the entry to and exit from onDrawFrame in my GLSurfaceView.Renderer by measuring the grabbing the SystemClock.uptimeMillis() and SystemClock.currentThreadTimeMillis() at each entry and exit to see how long each call spend in my code as well as the actual cpu time. times in ms, fps-> frames per second.

Galaxy S2 using 2 buffers:

max:74.0 min:65.0 avg:66.0 fps:9.883199 from 33 frames cpu: 15.86144%

max:77.0 min:65.0 avg:67.0 fps:9.88616 from 33 frames cpu: 15.955056%

max:84.0 min:54.0 avg:64.0 fps:10.197961 from 34 frames cpu: 13.786848%

Galaxy S2 using 1 buffer:

max:85.0 min:64.0 avg:68.0 fps:9.806835 from 33 frames cpu: 15.654102%

max:76.0 min:55.0 avg:64.0 fps:10.36116 from 35 frames cpu: 17.079645%

max:85.0 min:64.0 avg:68.0 fps:9.806835 from 33 frames cpu: 15.654102%

Nexus 7 using 2 buffers:

max:14.0 min:2.0 avg:4.0 fps:19.642859 from 66 frames cpu: 72.22222%

max:9.0 min:1.0 avg:3.0 fps:19.725044 from 66 frames cpu: 82.35294%

max:7.0 min:2.0 avg:3.0 fps:19.689737 from 66 frames cpu: 82.30453%

Nexus 7 using combined buffer:

max:7.0 min:2.0 avg:3.0 fps:19.881306 from 67 frames cpu: 80.0%

max:22.0 min:1.0 avg:4.0 fps:19.828352 from 67 frames cpu: 62.26994%

max:9.0 min:2.0 avg:3.0 fps:19.904932 from 67 frames cpu: 80.15564%