4
votes

I am trying to understand cameras in opengl that use matrices.

I've written a simple shader that looks like this:

#version 330 core

layout (location = 0) in vec3 a_pos;
layout (location = 1) in vec4 a_col;

uniform mat4 u_mvp_mat;
uniform mat4 u_mod_mat;
uniform mat4 u_view_mat;
uniform mat4 u_proj_mat;

out vec4 f_color;

void main()
{
    vec4 v = u_mvp_mat * vec4(0.0, 0.0, 1.0, 1.0);
    gl_Position =   u_mvp_mat * vec4(a_pos, 1.0);
    //gl_Position =   u_proj_mat * u_view_mat * u_mod_mat * vec4(a_pos, 1.0);
    f_color = a_col;
}

It's a bit verbose but that's because I am testing passing in either the model, view or projection matrices and doing the multiplication on the gpu or doing the multiplication on the cpu and passing in the mvp matrix and then just doing the mvp * position matrix multiplication.

I understand that the later one can offer performance increase but drawing 1 quad I don't really see any issues with performance at this point.

Right now I use this code to get the locations from my shader and create the model view and projection matrices.

pos_loc = get_attrib_location(ce_get_default_shader(), "a_pos");
col_loc = get_attrib_location(ce_get_default_shader(), "a_col");
mvp_matrix_loc = get_uniform_location(ce_get_default_shader(), "u_mvp_mat");
model_mat_loc = get_uniform_location(ce_get_default_shader(), "u_mod_mat");
view_mat_loc = get_uniform_location(ce_get_default_shader(), "u_view_mat");
proj_matrix_loc =
    get_uniform_location(ce_get_default_shader(), "u_proj_mat");

float h_w = (float)ce_get_width() * 0.5f;  //width = 320
float h_h = (float)ce_get_height() * 0.5f; //height = 480

model_mat = mat4_identity();
view_mat = mat4_identity();
proj_mat = mat4_identity();

point3* eye = point3_new(0, 0, 0);
point3* center = point3_new(0, 0, -1);
vec3* up = vec3_new(0, 1, 0);

mat4_look_at(view_mat, eye, center, up);
mat4_translate(view_mat, h_w, h_h, -20);

mat4_ortho(proj_mat, 0, ce_get_width(), 0, ce_get_height(), 1, 100);

mat4_scale(model_mat, 30, 30, 1);

mvp_mat = mat4_identity();

after this I setup my vao and vbo's then get ready to do rendering.

glClearColor(0.0f, 0.0f, 0.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glUseProgram(ce_get_default_shader()->shader_program);
glBindVertexArray(vao);

mvp_mat = mat4_multi(mvp_mat, view_mat, model_mat);
mvp_mat = mat4_multi(mvp_mat, proj_mat, mvp_mat);

glUniformMatrix4fv(mvp_matrix_loc, 1, GL_FALSE, mat4_get_data(mvp_mat));

glUniformMatrix4fv(model_mat_loc, 1, GL_FALSE, mat4_get_data(model_mat));
glUniformMatrix4fv(view_mat_loc, 1, GL_FALSE, mat4_get_data(view_mat));
glUniformMatrix4fv(proj_matrix_loc, 1, GL_FALSE, mat4_get_data(proj_mat));

glDrawElements(GL_TRIANGLES, quad->vertex_count, GL_UNSIGNED_SHORT, 0);
glBindVertexArray(0);

Assuming that all the matrix math is correct, I would like to abstract view and projection matrix out into a camera struct as well as the model matrix into a sprite struct so that I can avoid all this matrix math and make things easier to use.

The matrix multiplication order is:

Projection * View * Model * Vector

so the camera would hold the projection and view matrices while the sprite holds the model matrix.

Do all your camera transformations and your sprite transformations then right before you send the data to the gpu you do your matrix multiplications.

If I remember correctly matrix multiplication isn't commutative so doing view * projection * model will result in the wrong resulting matrix.

pseudo code

glClearxxx(....);
glUseProgram(..);
glBindVertexArray(..);

mvp_mat = mat4_identity();
proj_mat = camera_get_proj_mat();
view_mat = camera_get_view_mat();
mod_mat  = sprite_get_transform_mat();

mat4_multi(mvp_mat, view_mat, mod_mat); //mvp holds model * view
mat4_multi(mvp_mat, proj_mat, mvp_mat); //mvp holds proj * model * view

glUniformMatrix4fv(mvp_mat, 1, GL_FALSE, mat4_get_data(mvp_mat));

glDrawElements(...);
glBindVertexArray(0);

Is that a performant way to go about doing this that is scalable?

2

2 Answers

2
votes

Is that a performant way to go about doing this that is scalable?

Yes, unless you have a very exotic use case of some sort which is very unlike the norm.

The last thing you should typically ever be worrying about is with respect to the performance of retrieving a modelview and projection matrix out of a camera.

It's because those matrices typically only need to be fetched once per frame per viewport. There's millions of iterations worth of other work that could occur in a frame while scanline-rasterizing primitives, and pulling matrices out of a camera is just a simple constant-time operation.

So typically you want to just make it as convenient as you like. In my case, I go all the way through an abstract interface of function pointers in a central SDK, at which point the functions then compute the proj/mv/ti_mv matrix on the fly out of user-defined properties associated with the camera. In spite of this, it never shows up as a hotspot -- it doesn't even show up in the profiler at all.

There's far more expensive things to worry about. Scalability implies scale -- the complexity of retrieving matrices out of camera doesn't scale. The number of triangles or quads or lines or other primitives you want to render could scale, the number of fragments processed in a frag shader can scale. Cameras typically don't scale except with respect to the number of viewports, and no one should ever have use for a million viewports.

2
votes

I haven't checked that bit-wise, but it generally looks ok what you're doing.

I would like to abstract view and projection matrix out into a camera struct

That's a most appropriate idea; I can hardly imagine a serious GL application without such an abstraction

Is that a performant way to go about doing this that is scalable?

General constraints of scalability are

  • diffuse and specular BRDFs (which also require, btw, a light uniform, a normal attribute and calculation of a normal matrix if the scaling of the model is non-uniform) and need per-pixel illumination for quality rendering.

  • same with multiple lights (e.g. the sun and a close spotlight)

  • shadow maps! shadow maps? (one for each light-source?)

  • transparency

  • reflections (mirrors, glass, water)

  • textures

As you may take it from the list, you will not get very far with just an MVP uniform and a vertex coordinate attribute.

But the mere number of uniforms is by far not the most crucial points for performance - seeing your code I'm positive that you will not recompile your shaders unnecessarily, update your uniforms only if needed, use Uniform Buffer Objects etc..

The issue is the data that is plugged into those uniforms and VBOs. Or not.


Consider humanoid mesh "Alice" running (that's a mesh morph + translation) across a city square on a windy (water will have ripples) evening (more than one relevant light source), passing a fountain.

Lets' consider we collect it all for by all means on the CPU and old-school only plug ready-to render data into the shaders:

  • Alice's mesh is morphed, thus her VBOs need an update
  • Alice's mesh will move; thus all affected shadow maps will need an update (OK, given. they are generated by shadow illumination loops on the GPU, but if you do the wrong way you will shove a lot of data around)
  • Alice's reflection in the fountain will come and go
  • Alice's hair will be swirled - the CPU may have quiet a busy time, to say the least

(in fact the latter is so difficult that you will hardly see any halfway-realistic real-time long open hair animation, but amazingly (no, not really) many pony-tails and short hair cuts)

And we've not yet talked about Alice's attire; let's just hope she's wearing a t-shirt and a jeans (not wide shirt and a skirt, which would require fall-of-the-folds and collision calculations).

As you may have guessed that old-school approach doesn't take us far and thus, there is a fit to be found between between CPU and GPU operations.

In addition, one should think about parallelization of calculations at an early stage. It is advantageous to have the data as flat as possible in chunks as large as reasonable, so one just puts a pointer and size into a gl-call and bids that data farewell without any copying, re-arranging, looping or further ado.

That's my 2 cents of wisdom for today about GL performance and scalability.