Why does OpenCV decomposed homography translation matrix have three values?

Question

In a nutshell...the translation matrix returned from decomposing a homography matrix is actually a 3X1 matrix (a vector really). Yet, every description of a translation matrix is a 3X2 matrix.

Here are the two images (IR camera), the position 1 image (approx camera cartesian coords x = 0mm, y=300mm):

This is the position 2 image (approx camera cartesian coords x = 680mm, y=0mm):

I used the following with +90 points to determine the homography matrix (M):

M, mask = cv2.findHomography(source_pts, destination_pts, cv2.RANSAC,5.0)

This process picked out a good number of keypoints:

If you apply this homography matrix to the original image -- it works perfectly:

im_out = cv2.warpPerspective(img1,M, (640,480) )

and the output of the difference between the point set:

np.mean(dst_pts-src_pts , axis = 0)

array([[-305.16345, -129.94157]], dtype=float32)

is fairly close to the dot product of the homography matrix for a single point....

np.dot(M,[1,1,1])

array([-293.00352303, -132.93478376, 1.00009461])

I decomposed the homography matrix with the following command:

num, Rs, Ts,Ns = cv2.decomposeHomographyMat(M, camera_matrix)

This returns 4 solutions (num), a rotation matrix, a translation matrix, and Ns (cant remember what it is).

I'm interested in the translation matrix.

Firstly... The translation matrix, lists the 4 solutions (Is this correct?):

 Ts =
[array([[-0.60978834],[-0.26268874],[ 0.01638967]]), 
array([[ 0.60978834], [ 0.26268874],[-0.01638967]]), 
array([[-0.19035409],[-0.06628793],[ 0.63284046]]), 
array([[ 0.19035409], [ 0.06628793],[-0.63284046]])]

Secondly, and most puzzling is that each of the solutions has 3 values...

e.g., the first solution: [-0.6097, -0.2626, 0.01638967].

My understanding is that a translation matrix would have the form of :

Here is my reference

How do I get from the values returned from the decomposition matrix to the translation matrix in the form above? **ie... how do I convert this: [-0.6097, -0.2626, 0.01638967]

to this format:**

Thanks for your help.

imho, the solution is for 3D space, so your camera can move tx, ty and tz. Similarly, a 3D rotation has 3 degrees of freedom (a 3D rotation axis (normalized vector with dof = 2) and an angle. — Micka
Hi Micka, I thought this was the case at first also. However, the M matrix is 2D, the image is 2D, and usually solutions are determined using homogenous coordinates, I believe. Any movement on the Z axis for the camera would result in an enlargement translation on the image. A third axis does not seem meaningful. My main question remains, how is this 3X1 matrix (vector) equivalent to the 3X2 translation matrix? — userX
"This function extracts relative camera motion between two views observing a planar object from the homography H induced by the plane" - from my perspective this is camera motion in 3D space. If camera moves i z direction you might see a scaling in the M matrix. Decomposing a homography to its 2D transformation elements would give you: scaling, shearing, rotation, translation and the petspective part. — Micka
Thanks for that perspective. There will be a component in the z direction. However, the question remains -- how do you convert this 3X1 vector into a translation matrix (3X2)? — userX
Further to that comment, If I take the three values as a 3d displacement vector, it does not seem to make sense. For example, -.6097*640 pixels gives a lateral move of approx -390 pixels in that direction. However, subtracting the x values of the source and destination match points yields an actual move of -305 pixels. These are very different when you plot them up, not merrited by the very small rotation matrix observed....Where the diagnonal of the rotation matrix is > 0.9994. Surely there is an equivalent translation matrix (3X2) that would make clear what this 3 value vector actually is. — userX

Victor Deleau Victor Deleau · Accepted Answer · 2020-02-02T23:53:21

Let's take your first translation vector:

np.array([-0.60978834, -0.26268874, 0.01638967])

To me it looks like those are your tx, ty and tz estimated translation component. Plus those quantities make sense when I look at the image with the green dots. So I guess that your translation matrix in homogeneous coordinates would be:

M = np.array([[1, 0, 0, -0.60978834], [0, 1, 0, -0.26268874], [0, 0, 1, 0.01638967]])

Or simply:

M = np.array([[1, 0, -0.60978834], [0, 1, -0.26268874]])

If you ignore the tz component. Isn't this what you're looking for ?

Why does OpenCV decomposed homography translation matrix have three values?

1 Answers