3D rendering in OpenGL: model/view/projection vs translation/rotation/camera matrices

Question

I want to add to a captured frame from a camera a mesh model (Let's say a cube)

I also know all the information about where to put the cube:

Translation matrix - relative to the camera
Rotation matrix - relative to the camera
camera calibration matrix - focal length, principal point, etc. (intrinsic parameters)

How can I convert this information to model/view/projection matrices?

What should be the values to set to these matrices?

For example, let's say that I want to display the point [x, y, z, 1] on the screen, then that should be something like: [u, v, 1] = K * [R | T] * [x, y, z, 1], while:

u, v are the coordinates in the screen (or camera capture) and:

K, R and T are intrinsic camera parameters, rotation and translation, respectively.

How to convert K, R, T to model/view/projection matrices?

Yakov Galka Yakov Galka · Accepted Answer · 2016-11-28T21:17:38

[R | T] would be your model-view matrix and K would be your projection matrix.

Model-view matrix is usually one matrix. The separation is only conceptual: Model translates from model coordinates to world coordinates and View from world coordinates to camera (not-yet-projected) coordinates. It makes sense in applications where the camera and the objects move independently from each other. In your case, on the other hand, camera can be considered fixed and everything else described relative to the camera. So you have to deal only with two matrices: model-view and projection.

3D rendering in OpenGL: model/view/projection vs translation/rotation/camera matrices

2 Answers