Camera pose from solvePnP

Question

Goal

I need to retrieve the position and attitude angles of a camera (using OpenCV / Python).

Definitions

Attitude angles are defined by:

Yaw being the general orientation of the camera when it lays on an horizontal plane: toward north=0, toward east = 90°, south=180°, west=270°, etc.

Pitch being the "nose" orientation of the camera: 0° = looking horizontally at a point on the horizon, -90° = looking down vertically, +90° = looking up vertically, 45° = looking up at an angle of 45° from the horizon, etc.

Roll being if the camera is tilted left or right when in your hands (so it is always looking at a point on the horizon when this angle is varying): +45° = tilted 45° in a clockwise rotation when you grab the camera, thus +90° (and -90°) would be the angle needed for a portrait picture for example, etc.

World reference frame:

My world reference frame is oriented so:

Toward east = +X
Toward north = +Y
Up toward the sky = +Z

My world objects points are given in that reference frame.

Camera reference frame:

According to the doc, the camera reference frame is oriented like that:

What to achieve

Now, from cv2.solvepnp() over a bunch of images points and their corresponding world coordinates, I have computed both rvec and tvec.
But, according to the doc: http://docs.opencv.org/trunk/d9/d0c/group__calib3d.html#ga549c2075fac14829ff4a58bc931c033d , they are:

rvec ; Output rotation vector (see Rodrigues()) that, together with tvec, brings points from the model coordinate system to the camera coordinate system.
tvec ; Output translation vector.

these vectors are given to go to the camera reference frame.
I need to do the exact inverse operation, thus retrieving camera position and attitude relative to world coordinates.

Camera position:

So I have computed the rotation matrix from rvec with Rodrigues():

rmat = cv2.Rodrigues(rvec)[0]

And if I'm right here, the camera position expressed in the world coordinates system is given by:

camera_position = -np.matrix(rmat).T * np.matrix(tvec)

(src: Camera position in world coordinate from cv::solvePnP )
This looks fairly well.

Camera attitude (yaw, pitch and roll):

But how to retrieve corresponding attitude angles (yaw, pitch and roll as describe above) from the point of view of the camera (as if it was in your hands basically)?

I have tried implementing this: http://planning.cs.uiuc.edu/node102.html#eqn:yprmat in a function:

def rotation_matrix_to_attitude_angles(R):
    import math
    import numpy as np 
    cos_beta = math.sqrt(R[2,1] * R[2,1] + R[2,2] * R[2,2])
    validity = cos_beta < 1e-6
    if not validity:
        alpha = math.atan2(R[1,0], R[0,0])    # yaw   [z]
        beta  = math.atan2(-R[2,0], cos_beta) # pitch [y]
        gamma = math.atan2(R[2,1], R[2,2])    # roll  [x]
    else:
        alpha = math.atan2(R[1,0], R[0,0])    # yaw   [z]
        beta  = math.atan2(-R[2,0], cos_beta) # pitch [y]
        gamma = 0                             # roll  [x]  
    return np.array([alpha, beta, gamma])

But results are not consistent with what I want. For example, I have a roll angle of ~ -90°, but the camera is horizontal so it should be around 0.

Pitch angle is around 0 so it seems correctly determined but I don't really understand why it's around 0 as the Z-axis of the camera reference frame is horizontal, so it's has already been tilted from 90° from the vertical axis of the world reference frame. I would have expected a value of -90° or +270° here. Anyway.

And yaw seems good. Mainly.

Question

Did I miss something with the roll angle?

I am having the EXACT same problem: I get the camera position using the same procedure as you, and it looks correct. But the yaw, pitch and roll angles seems to make no sense... Did you manage fix this problem? I am very interested. — user2756724

user3282375 user3282375 · Accepted Answer · 2021-04-20T03:27:36

I think your transformation is missing a rotation. If I interpret your question correctly, you are asking what the inverse of (rotation by R followed by translation T)

${\hat{R}|\vec{T}}.\vec{r}=\hat{R}.\vec{r}+\vec{T}$

The inverse should return the identity

${\hat{R}|\vec{T}}^{-1}.{\hat{R}|\vec{T}}={\hat{1}|0}$

Working this through yields

${\hat{R}|\vec{T}}^{-1}={\hat{R}^-1|-\hat{R}^-1\cdot \vec{T}}$

As far as I could tell you were using the $-\hat{R}^-1\cdot \vec{T}$ (undoing th translation) part of the answer but leaving out the inverse rotation $\hat{R}^-1$

Rotation+Translation:

${\hat{R}|\vec{T}}\vec{r}=\hat{R}\cdot\vec{r}+\vec{T}$

Inverse of (Rotation+Translation):

${\hat{R}|\vec{T}}^{-1}\vec{r}=\hat{R}^{-1}\cdot\vec{r}-\hat{R}^{-1}\cdot \vec{T}$

Non-latex mode (R^-1*r-R^-1*T) is the inverse of (R.r+T)