3
votes

I have been reading Programming Computer Vision with Python by Jan Erik Solem which is a pretty good book, however I haven't been able to clarify a question regarding image registration.

Basically, we have a bunch of images (faces) that need to be aligned a bit so the first thing needed is to perform a rigid transformation via a similarity transformation:

x' = | sR t | x
     | 0  1 |

where x is the vector (a set of coordinates in this case) to be transform into x' via a rotation R, a translation t and maybe a scaling s.

Solem calculates this rigid transformation for each image which returns the rotation matrix R and a translation vector as tx and ty:

R,tx,ty = compute_rigid_transform(refpoints, points)

However, he reorders the elements of R for some reason:

T = array([[R[1][1], R[1][0]], [R[0][1], R[0][0]]])

and later he performs an affine transformation:

im2[:,:,i] = ndimage.affine_transform(im[:,:,i],linalg.inv(T),offset=[-ty,-tx])

In this example, this affine transformation is performed on each channel but that's not relevant. im[:,:,i] is the image to be processed and this procedure returns another image.

What is T and why are we inverting that matrix in the affine transformation? And what are the usual steps to achieve image registration?

Update

Here you can find the relevant part of this code in Google Books. Starts at the bottom of page 67.

2
I'm not sure what's going on in the reordering of R (for starters, the rotation matrix in 3D should be 3x3), but in general the inverse of the rotation matrix will "undo" the rotation (just as the negatives of the translations will "undo" the translations). Maybe an example of R and the resultant T would help. - beaker
That's what I thought, but I made a mistake on matrix T. It should be: T = array([[R[1][1], R[1][0]], [R[0][1], R[0][0]]]) I don't know if an example of R would help since it contains a bunch of numbers. As you know, R is [[cos(t) -sin(t)] [sin(t) cos(t)]] and T should be [[cos(t) sin(t)] [-sin(t) cos(t)]] - Robert Smith
Are you sure you have the T array right? The change of basis should be the inverse of R-transpose. (That would have been sooo much easier in LaTeX...) - beaker
I added an update with the relevant part. It looks like T is correct. Anyway, why should there be T? - Robert Smith
It looks like an error in the code to me. T appears to just be the transpose of R, which for a rotation matrix is the same as the inverse. Then he takes the inverse (again) in the call to ndimage.affine_transform. I think it should be either T or linalg.inv(R) passed to that function. - aganders3

2 Answers

1
votes

It looks like an error in the code to me. T appears to just be the transpose of R, which for a rotation matrix is the same as the inverse. Then he takes the inverse (again) in the call to ndimage.affine_transform. I think it should be either T or linalg.inv(R) passed to that function.

0
votes

I will try to answer your question and point out a mistake (?) in the book. (1) Why using T = array([[R[1][1], R[1][0]], [R[0][1], R[0][0]]]) ? since R,tx,ty = compute_rigid_transform(refpoints, points) computes rotation matrix and translation in the form:

|x'| = s|R[0][0] R[0][1]||x| + |tx|             Equation (1)
|y'|    |R[1][0] R[1][1]||y|   |ty|

HOWEVER, OUT = ndimage.affine_transform(IN,A,b) requires the coordinate in the form of (y,x) NOT in the order of (x,y). So the above Equation (1) will become

|y'| = s|R[1][1] R[1][0]||y| + |ty| = T|y| + |ty|        Equation(2)
|x'|    |R[0][1] R[0][0]||x|   |tx|    |x|   |tx|

Then, in function ndimage.affine_transform() the matrix will be linalg.inv(T), not linalg.inv(R).

(2) The affine transform OUT = ndimage.affine_transform(IN,A,b) in fact is A*OUT + b => IN . According to Equation (2), rewrite it as

|y| = inv(T)|y'| - inv(T)|ty|
|x|         |x'|         |tx|

So the offset in function ndimage.affine_transform() is inv(T)[-ty, -tx], not [-ty -tx]. I think this is a bug in the original code.