Camera to world transformation and world to camera transformation

Question

I am kind of confused by the camera to world vs. world to camera transformation.

In the openGL rendering pipeline, The transformation is from the world to the camera, right? Basically, a view coordinate frame is constructed at the camera, then the object in the world is first translated relative to the camera and then rotated with the camera coordinate frame. is this what gluLookat is performing?
If I want to go from camera view back to world view, how it should be done? Mathematically, I am thinking finding the inverse of the translate and rotate matrices, then apply the rotation before the translate, right?

Nico Schertler Nico Schertler · Accepted Answer · 2013-02-25T21:38:26

Usually there are several transformations that map world positions to the screen. The following ones are the most common ones:

World transformation: Can be applied to objects in order to realign them relatively to other objects.
View transformation: After this transformation the camera is at O and looks in the z direction.
Projection transformation: Performs e.g. perspective transformations to simulate a real camera
Viewport adaption: This is basically a scaling and translation that maps the positions from range [-1, 1] to the viewport. This is usually the screen width and height.

gluLookAt is used to create a view transformation. You can imagine it as follows: Place the camera somewhere in your scene. Now transform the whole scene (with the camera) so that the camera is at the origin, it faces in the z direction and the y axis represents the up direction. This is a simple rigid body transformation that can be represented as an arbitrary rotation (with three degrees of freedom) and an arbitrary translation (with another three degrees of freedom). Every rigid body transformation can be split into separate rotations and translation. Even the sequence of evaluation can vary, if you choose the correct values. Transformations can be interpreted in different ways. I wrote a blog entry on that topic a while ago. If you're interested, take a look at it. Although it is for DirectX, the maths is pretty much the same for OpenGL. You just have to watch out for transposed matrices.

For the second question: Yes, you are right. You need to find the inverse transformation. This can be easily achieved with the inverse matrix. If you specified the view matrix V as follows:

V = R_xyz * T_xyz

then the inverse transformation V^-1 is

V^-1 = T_xyz^-1 * R_xyz^-1

However, this does not map screen positions to world positions because there is more transformation going on. I hope, that answers your questions.

Here is another interesting point. The view matrix is the inverse of the transformation that would align a camera model (at the origin, facing in z direction) at the specified position. This relation is called system transformation vs. model transformation.

Camera to world transformation and world to camera transformation

1 Answers