In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
- modelview
- projection
- texture
- color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
- Object to world – OpenGL calls this the "model transform"
- World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
- local to world, Model matrix
vpos_world = M · vpos_local
- world to eye, View matrix
vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M
by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix
-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.