This is a question about terminology used in computer graphics and vision. I want to construct a camera projection matrix, using 2D to 3D correspondences. From these correspondences I create a camera object. I am using a class in a library to represent the camera. It takes the following parameters:
// R is the orientation of the camera expressed in a world coordinate frame
// t is the position of the camera expressed in a world coordinate frame
The first part of my question is: are R and t, as defined above, the extrinsic parameters satisfying x=K[R|t]X
? Or do they need to be converted (for example, transpose(R) is the extrinsic orientation, and -transpose(R)*t for position).
I am obtaining R and t using openCV's solvePnP function. The function returns R and t as follows:
rvec – Output rotation vector (see Rodrigues() ) that, together with tvec , brings points from the model coordinate system to the camera coordinate system.
tvec – Output translation vector.
The second part of my question is, based on the descriptions above, are the outputs equivalent to my camera's extrinsic parameters, or do they also need to be transformed (as previously defined)?