Understanding third row of Perspective projection matrix

Question

What is the purpose of the third row of this perspective matrix? Couldn't we simply keep the original z coordinate, and still be able to determine which points should be drawn in front of others? If we replace cell (3,3) by 1 and cell (3,4) by 0, the point would be drawn at the correct x and y position and we could still use the fourth coordinate of a point as homogeneous. What am I missing? Thanks!

I think that this row just scales your [near,far] range into [0,1] for clipping. After this transformation everything you're going to see lies in [0,1] range, the rest will be ignored. — kolenda

Nico Schertler Nico Schertler · Accepted Answer · 2016-09-27T16:05:49

Yes, theoretically you could keep the z-values as they are to get comparability. However, practice does not allow that. And actually, this matrix will not work on its own. But let me come to that later. Here is a diagram of the resulting projected z versus the input z coordinate, assuming a near clipping plane of 1 and a far clipping plane of 5:

As you can see, the resulting depth is still in the same range between 1 and 5, but it is a bit buckled. The reason for that is the following:

The resulting z-values you get after the projection (normalized device coordinates) are written to the depth buffer (OpenGL performs an additional transformation to window coordinates, but let's ignore that for now). The depth buffer has a fixed resolution of usually 24 or 32 bits. Theoretically, you could encode the depth as a floating point number, but this would waste a lot of precision. And a precise depth buffer is essential in many cases to avoid flickering.

So instead, the z-buffer is encoded as some kind of integer, scaled to the appropriate range between znear and zfar. Hence, the vertical axis in the above diagram is divided into equal steps. Now the buckling results in points closer to znear spreading over a larger range on the depth axis. Hence, a certain z-range is encoded with a larger depth range if the point is close to the camera and a smaller depth range if it is far away. This results in higher precision near the camera (remember that the depth axis is subdivided uniformly). And this is usually exactly what you want. Objects very close to the camera can be seen in great detail and problems in the depth buffer would be obvious. However, objects very far away cannot be seen as clearly and there is no need for the high depth resolution.

As stated before, this matrix cannot be used on its own directly. That is because the z-values of normalized device coordinates are considered to be in the [0, 1] range for DirectX or the [-1, 1] range for OpenGL, where the lower represents the depth of the near clipping plane and the upper represents the far clipping plane. So these APIs cannot work with the resulting depth unless some additional transformation is applied.

Understanding third row of Perspective projection matrix

1 Answers