2
votes

I am using Matlab 2014a's 'extrinsics' function. This function returns the translation and rotation of the camera, relative to the world coordinate system. In my understanding, the translation vector, say that is C, returned by this function is a translation from camera coordinates to world coordinates. In another words, this vector can be interpreted as the position of the camera center in world coordinates. Now, I am confused.

If C is the translation from camera to world, then the translation from world to camera should be T = -transp(R)*C, where R is rotation matrix returned by 'extrinsics' function. But in Matlab example http://www.mathworks.com/com/help/vision/examples/sparse-3-d-reconstruction-from-multiple-views.html, C rather than T is used as the translation from world coordinates to camera coordinates. Why?

3

3 Answers

2
votes

I would rather use the Hartley and Zisserman de facto way of defining/understanding this more precisely. Such a definition underpins the Matlab library (indeed, see the references at the end of your Matlab reference URL!).

Thus, define the image point x and the scene point X, such that:

x = P X

Where P is the projection matrix that incorporates the camera calibration matrix K (intrinsics) and the external transformation matrix (extrinsics) [R | t], such that:

P = K [R | t]

Here is the same in a free tutorial paper. Notice the first couple of slides.

I hope that helps!

2
votes

@timlukins is correct. R and t returned by the extrinsics function represent the transformation from the world coordinates into the camera-based coordinates. The equation is as follows:

s*[x y 1] = [X Y Z 1]*[R; t]*K

where [X Y Z] are world coordinates, [x y] are image coordinates in pixels, K is the intrinsic matrix, and s is an arbitrary scale factor. Note that this equation looks different from that in Hartley and Zisserman, because MATLAB uses a pre-multiply convention. So all the vectors are row-vectors, all the matrices are transposed, and the order of multiplication is reversed.

This equation is given in the documentation for the cameraParameters object

0
votes

We define C as the camera position in world coordinates,T as the translation vector from world coordinates to the camera coordinates. In fact, Matlab plots the chessboard in the camera coordinates, which means one need to recompute the chessboard position in the camera coordinates and the result happens to be T. So Matlab plots the chessboard with T directly.