I am trying to do / understand all the basic mathematical computations needed in the graphics pipeline to render a simple 2D image from a 3D scene description like VRML. Is there a good example of the steps needed, like model transformation (object coordinates to world coordinates), view transformation (from world coordinate to view coordinate), calculation of vertex normals for lighting, clipping, calculating the screen coordinates of objects inside the view frustum and creating the 2D projection to calculate the individual pixels with colors.
2 Answers
I am used to OpenGL style render math so I stick to it (all the renders use almost the same math)
First some therms to explain:
- Transform matrix
Represents a coordinate system in 3D space
double m[16]; // it is 4x4 matrix stored as 1 dimensional array for speed
m[0]=xx; m[4]=yx; m[ 8]=zx; m[12]=x0;
m[1]=xy; m[5]=yy; m[ 9]=zy; m[13]=y0;
m[2]=xz; m[6]=yz; m[10]=zz; m[14]=z0;
m[3]= 0; m[7]= 0; m[11]= 0; m[15]= 1;
where:
X(xx,xy,xz)
is unit vector ofX
axis in GCS (global coordinate system)Y(yx,yy,yz)
is unit vector ofY
axis in GCSZ(zx,zy,zz)
is unit vector ofZ
axis in GCSP(x0,y0,z0)
is origin of represented coordinate system in GCS
Transformation matrix is used to transform coordinates between GCS and LCS (local coordinate system)
- GCS
->
LCS:Al = Ag * m;
- GCS
<-
LCS:Ag = Al * (m^-1);
Al (x,y,z,w=1)
is 3D point in LCS ... in homogenous coordinatesAg (x,y,z,w=1)
is 3D point in GCS ... in homogenous coordinates
homogenous coordinate w=1
is added so we can multiply 3D vector by 4x4 matrix
m
transformation matrixm^-1
inverse transformation matrix
In most cases is m
orthonormal which means X,Y,Z
vectors are perpendicular to each other and with unit size this can be used for restoration of matrix accuracy after rotations,translations,etc ...
For more info see Understanding 4x4 homogenous transform matrices
- Render matrices
There are usually used these matrices:
model
- represents actual rendered object coordinate systemview
- represents camera coordinate system (Z
axis is the view direction)modelview
- model and view multiplied togethernormal
- the same asmodelview
butx0,y0,z0 = 0
for normal vector computationstexture
- manipulate texture coordinates for easy texture animation and effect usually an unit matrixprojection
- represent projections of camera view ( perspective ,ortho,...) it should not include any rotations or translations its more like Camera sensor calibration instead (otherwise fog and other effects will fail ...)
- The rendering math
To render 3D scene you need 2D rendering routines like draw 2D textured triangle ... The render converts 3D scene data to 2D and renders it. There are more techniques out there but the most usual is use of boundary model representation + boundary rendering (surface only) The 3D ->
2D conversion is done by projection (orthogonal or perspective) and Z-buffer or Z-sorting.
- Z-buffer is easy and native to now-days gfx HW
- Z-sorting is done by CPU instead so its slower and need additional memory but it is necessary for correct transparent surfaces rendering.
So the pipeline is as this:
- obtain actual rendered data from model
- Vertex
v
- Normal
n
- Texture coord
t
- Color,Fog coord, etc...
- convert it to appropriate space
v=projection*view*model*v
... camera space + projectionn=normal*n
... global spacet=texture*t
... texture space
- clip data to screen
This step is not necessary but prevent to render of screen stuff for speed and also face culling is usually done here. If normal vector of rendered 'triangle' is opposite then the polygon winding rule set then ignore 'triangle'
- render the 3D/2D data
use only v.x,v.y
coordinates for screen rendering and v.z for z-buffer test/value also here goes the perspective division for perspective projections
v.x/=v.z,vy/=v.z
Z-buffer works like this: Z-buffer (zed
) is 2D array with the same size (resolution) as screen (scr
). Any pixel scr[y][x]
is rendered only if (zed[y][x]>=z)
in that case scr[y][x]=color; zed[y][x]=z;
The if condition can be different (it is changeable)
In case of using triangles or higher primitives for rendering The resulting 2D primitives are converted to pixels in process called rasterization for example like this:
For more clarity here is how it looks like:
[Notes]
Transformation matrices are multiplicative so if you need transform N
points by M
matrices you can create single matrix = m1*m2*...mM
and convert N
points by this resulting matrix
only (for speed). Sometimes are used 3x3
transform matrix + shift vector
instead of 4x4
matrix. it is faster in some cases but you cannot multiply more transformations together so easy. For transformation matrix manipulation look for basic operations like Rotate or Translate there are also matrices for rotations inside LCS which are more suitable for human control input but these are not native to renders like OpenGL or DirectX. (because they use inverse matrix)
Now all the above stuff was for standard polygonal rendering (surface boundary representation of objects). There are also other renderers out there like Volumetric rendering or (Back)Ray-tracers and hybrid methods. Also the scene can have any dimensionality not just 3D. Here some related QAs covering these topics:
You can have a look Chapter 15 from the book Computer Graphics: Principles and Practice - Third Edition by Hughes et al. That chapter
Derives the ray-casting and rasterization algorithms and then builds the complete source code for a software ray-tracer, software rasterizer, and hardware-accelerated rasterization renderer.