1
votes

I've been reading Prince's book Computer Vision: Models, Inference, and Learning, specifically with the aim of understanding camera parameters and the pose estimation problem and I'm having some trouble with the extrinsic camera parameters. As I understand it, the extrinsic camera parameters consist of a rotation matrix and a translation vector. The rotation matrix transforms the world co-ordinate system into the camera co-ordinate frame. My question is whether the rotation matrix is a rotation matrix in the strict sense; as in it's orthogonal and has determinant 1.

I ask because in a subsequent chapter on geometric transformations, he describes the case where the camera is viewing a plane (w/z co-ordinate = 0), and introduces affine and projective transformations represented by the extrinsic camera matrix. I'm confused because such transformations can't be achieved using a rotation matrix, or am I wrong? Generally confused

1

1 Answers

1
votes

Affine and projective transformations are represented by a projection matrix.

For the typical case of a pinhole camera, you can think of the projection matrix as the product P = K * [R | t] of a 3x3 upper triangular matrix K representing the camera's intrinsic parameters, and a 3x4 roto-translation matrix [R | t], with R being a 3x3 orthonormal rotation matrix, and t a 3x1 translation vector. Matrix P transforms a 4x1 homogeneous 3D point in world frame to a 3x1 homogeneous 2D point in image coordinates.

The columns of R are ordinately the components of the x,y,z world frame axes in the camera coordinates. Vector t is the displacement from the origin of the camera frame to the world frame.