kitti dataset camera projection matrix

Question

I am looking at the kitti dataset and particularly how to convert a world point into the image coordinates. I looked at the README and it says below that I need to transform to camera coordinates first then multiply by the projection matrix. I have 2 questions, coming from a non computer vision background

I looked at the numbers from calib.txt and in particular the matrix is 3x4 with non-zero values in the last column. I always thought this matrix = K[I|0], where K is the camera's intrinsic matrix. So, why is the last column non-zero and what does it mean? e.g P2 is

array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
       [0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
       [0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])

After applying projection into [u,v,w] and dividing u,v by w, are these values with respect to origin at the center of image or origin being at the top left of the image?

README:

calib.txt: Calibration data for the cameras: P0/P1 are the 3x4 projection matrices after rectification. Here P0 denotes the left and P1 denotes the right camera. Tr transforms a point from velodyne coordinates into the left rectified camera coordinate system. In order to map a point X from the velodyne scanner to a point x in the i'th image plane, you thus have to transform it like:
  x = Pi * Tr * X

jarvish jarvish · Accepted Answer · 2020-05-08T17:02:37

For all the P matrices (3x4), they represent:

P(i)rect = [[fu 0  cx  -fu*bx],
            [0  fv  cy -fv*by],
            [0   0   1  0]]

Last column are baselines in meters w.r.t. the reference camera 0. You can see the P0 has all zeros in the last column because it is the reference camera.

This post has more details: How Kitti calibration matrix was calculated?

kitti dataset camera projection matrix

2 Answers