1
votes

In a nutshell...the translation matrix returned from decomposing a homography matrix is actually a 3X1 matrix (a vector really). Yet, every description of a translation matrix is a 3X2 matrix.

Here are the two images (IR camera), the position 1 image (approx camera cartesian coords x = 0mm, y=300mm): initial image, x=0mm, y=300mm

This is the position 2 image (approx camera cartesian coords x = 680mm, y=0mm): final image, x=680 mm, y=0mm

I used the following with +90 points to determine the homography matrix (M):

M, mask = cv2.findHomography(source_pts, destination_pts, cv2.RANSAC,5.0)

This process picked out a good number of keypoints: matching keypoints between the two images

If you apply this homography matrix to the original image -- it works perfectly:

im_out = cv2.warpPerspective(img1,M, (640,480) )

homography applied to original image

and the output of the difference between the point set:

np.mean(dst_pts-src_pts , axis = 0)

array([[-305.16345, -129.94157]], dtype=float32)

is fairly close to the dot product of the homography matrix for a single point....

np.dot(M,[1,1,1])

array([-293.00352303, -132.93478376, 1.00009461])

I decomposed the homography matrix with the following command:

num, Rs, Ts,Ns = cv2.decomposeHomographyMat(M, camera_matrix)

This returns 4 solutions (num), a rotation matrix, a translation matrix, and Ns (cant remember what it is).

I'm interested in the translation matrix.

Firstly... The translation matrix, lists the 4 solutions (Is this correct?):

 Ts =
[array([[-0.60978834],[-0.26268874],[ 0.01638967]]), 
array([[ 0.60978834], [ 0.26268874],[-0.01638967]]), 
array([[-0.19035409],[-0.06628793],[ 0.63284046]]), 
array([[ 0.19035409], [ 0.06628793],[-0.63284046]])]

Secondly, and most puzzling is that each of the solutions has 3 values...

e.g., the first solution: [-0.6097, -0.2626, 0.01638967].

My understanding is that a translation matrix would have the form of : translation matrix

Here is my reference

How do I get from the values returned from the decomposition matrix to the translation matrix in the form above? **ie... how do I convert this: [-0.6097, -0.2626, 0.01638967]

to this format:** translation matrix

Thanks for your help.

1
imho, the solution is for 3D space, so your camera can move tx, ty and tz. Similarly, a 3D rotation has 3 degrees of freedom (a 3D rotation axis (normalized vector with dof = 2) and an angle.Micka
Hi Micka, I thought this was the case at first also. However, the M matrix is 2D, the image is 2D, and usually solutions are determined using homogenous coordinates, I believe. Any movement on the Z axis for the camera would result in an enlargement translation on the image. A third axis does not seem meaningful. My main question remains, how is this 3X1 matrix (vector) equivalent to the 3X2 translation matrix?userX
"This function extracts relative camera motion between two views observing a planar object from the homography H induced by the plane" - from my perspective this is camera motion in 3D space. If camera moves i z direction you might see a scaling in the M matrix. Decomposing a homography to its 2D transformation elements would give you: scaling, shearing, rotation, translation and the petspective part.Micka
Thanks for that perspective. There will be a component in the z direction. However, the question remains -- how do you convert this 3X1 vector into a translation matrix (3X2)?userX
Further to that comment, If I take the three values as a 3d displacement vector, it does not seem to make sense. For example, -.6097*640 pixels gives a lateral move of approx -390 pixels in that direction. However, subtracting the x values of the source and destination match points yields an actual move of -305 pixels. These are very different when you plot them up, not merrited by the very small rotation matrix observed....Where the diagnonal of the rotation matrix is > 0.9994. Surely there is an equivalent translation matrix (3X2) that would make clear what this 3 value vector actually is.userX

1 Answers

1
votes

Let's take your first translation vector:

np.array([-0.60978834, -0.26268874, 0.01638967])

To me it looks like those are your tx, ty and tz estimated translation component. Plus those quantities make sense when I look at the image with the green dots. So I guess that your translation matrix in homogeneous coordinates would be:

M = np.array([[1, 0, 0, -0.60978834], [0, 1, 0, -0.26268874], [0, 0, 1, 0.01638967]])

Or simply:

M = np.array([[1, 0, -0.60978834], [0, 1, -0.26268874]])

If you ignore the tz component. Isn't this what you're looking for ?