0
votes

I am willing to perform a 360° Panorama stitching for 6 fish-eye cameras.

In order to find the relation among cameras I need to compute the Homography Matrix. The latter is usually computed by finding features in the images and matching them.

However, for my camera setup I already know:

  • The intrinsic camera matrix K, which I computed through camera calibration.
  • Extrinsic camera parameters R and t. The camera orientation is fixed and does not change at any point. The cameras are located on a circle of known diameter d, being each camera positioned with a shift of 60° degrees with respect to the circle.

Therefore, I think I could manually compute the Homography Matrix, which I am assuming would result in a more accurate approach than performing feature matching.

In the literature I found the following formula to compute the homography Matrix which relates image 2 to image 1:

H_2_1 = (K_2) * (R_2)^-1 * R_1 * K_1

This formula only takes into account a rotation angle among the cameras but not the translation vector that exists in my case.

How could I plug the translation t of each camera in the computation of H?

I have already tried to compute H without considering the translation, but as d>1 meter, the images are not accurate aligned in the panorama picture.

EDIT:

Based on Francesco's answer below, I got the following questions:

  • After calibrating the fisheye lenses, I got a matrix K with focal length f=620 for an image of size 1024 x 768. Is that considered to be a big or small focal length?

  • My cameras are located on a circle with a diameter of 1 meter. The explanation below makes it clear for me, that due to this "big" translation among the cameras, I have remarkable ghosting effects with objects that are relative close to them. Therefore, if the Homography model cannot fully represent the position of the cameras, is it possible to use another model like Fundamental/Essential Matrix for image stitching?

1
There is no "big" or "small" in absolute terms, it depends on how far the objects in the scene you want to look at are. 2 * atan(512/620) ~ 100deg, are you sure these lenses are fisheye? It is certainly possible to stitch with models other than a simple homography. You may want to look into the panotools softwareFrancesco Callari
@FrancescoCallari my cameras have a similar view to the following picture (the one above) upload.wikimedia.org/wikipedia/commons/2/2c/Panotools5618.jpg . I got those focal length values from the K matrix calculated with the OpenCV's fisheye camera calibration sample code, are they not what one expects to get for a fisheye camera? I am developing a real-time stitcher and I am working with OpenCV mainly. Could you tell me which other models are there that could represent the translation? So I could do some research on them, thanks again!makolele12

1 Answers

2
votes

You cannot "plug" the translation in: its presence along with a nontrivial rotation mathematically implies that the relationship between images is not a homography.

However, if the imaged scene is and appears "far enough" from the camera, i.e. if the translations between cameras are small compared to the distances of the scene objects from the cameras, and the cameras' focal lengths are small enough, then you may use the homographies induced by the pure rotation as approximations.

Your equation is wrong. The correct formula is obtained as follows:

  • Take a pixel in camera 1: p_1 = (x, y, 1) in homogeneous coordinates
  • Back project it into a ray in 3D space: P_1 = inv(K_1) * p_1
  • Decompose the ray in the coordinates of camera 2: P_2 = R_2_1 * P1
  • Project the ray into a pixel in camera 2: p_2 = K_2 * P_2
  • Put the equations together: p_2 = [K_2 * R_2_1 * inv(K_1)] * p_1

The product H = K2 * R_2_1 * inv(K1) is the (approximate) homography. The rotation R_2_1 transforms points into frame 2 from frame 1. It is a 3x3 matrix whose columns are the components of the x, y, z axes of frame 1 decomposed in frame 2. If your setup gives you the rotations of all the cameras with respect to a common frame 0, i.e. as R_i_0, then it is R_2_1 = R_2_0 * R_1_0.transposed.

Generally speaking, you should use the above homography as an initial estimation, to be refined by matching points and optimizing. This is because (a) the homography model itself is only an approximation (since it ignores the translation), and (b) the rotations given by the mechanical setup (even a calibrated one) are affected by errors. Using matched pixels to optimize the transformation will minimize the errors where it matters, on the image, rather than in an abstract rotation space.