Are a camera's extrinsic parameters expressed in a world coordinate frame?

Question

This is a question about terminology used in computer graphics and vision. I want to construct a camera projection matrix, using 2D to 3D correspondences. From these correspondences I create a camera object. I am using a class in a library to represent the camera. It takes the following parameters:

// R is the orientation of the camera expressed in a world coordinate frame
// t is the position of the camera expressed in a world coordinate frame

The first part of my question is: are R and t, as defined above, the extrinsic parameters satisfying x=K[R|t]X? Or do they need to be converted (for example, transpose(R) is the extrinsic orientation, and -transpose(R)*t for position).

I am obtaining R and t using openCV's solvePnP function. The function returns R and t as follows:

rvec – Output rotation vector (see Rodrigues() ) that, together with tvec , brings points from the model coordinate system to the camera coordinate system.
tvec – Output translation vector.

The second part of my question is, based on the descriptions above, are the outputs equivalent to my camera's extrinsic parameters, or do they also need to be transformed (as previously defined)?

Dima Dima · Accepted Answer · 2015-10-21T16:57:23

The camera projection matrix is typically defined as P = K [R|t], where R and t are the extrinsics, and not the camera orientation and location in the world coordinates.

As to what solvePnP returns, you would have to read its documentation, or try it out and see.

Are a camera's extrinsic parameters expressed in a world coordinate frame?

2 Answers