0
votes

Is there any particular reason why we need multiple poses (e.g. varying z or rotation) to obtain the focal length and principal point for the camera matrix? In other words, is it sufficient to calibrate a pinhole camera with a single pose? i.e. by keeping the location of the calibration object (let's say a standard checkerboard) constant?

1
Please visit what topic can I ask on StackOverflow - afzalex
This question is discussed in "Learning OpenCV" by Gary Bradski and Adrian Kaehler on page 388. Short answer: No, a single image is not enough. You'll need at least 10 images of a 7x8 chessboard (or larger). - Claus

1 Answers

5
votes

I assume you are asking in the context of OpenCV-like camera calibration using images of a planar target. The reference for the algorithm used by OpenCV is Z. Zhang's now classic paper . The discussion in the top half of page 6 shows that n >= 3 images are necessary for calibrating all 5 parameters of a pinhole camera matrix. Imposing constraints on the parameters reduces the number of needed images to a theoretical minimum of one.

In practice you need more for various reasons, among them:

  • The need to have enough measurements to overcome "noise" and "random" corner detection errors, while using a practical target with well-separated corners.
  • The difference between measuring data and observing (constraining) model parameters.
  • Practical limitations of physical lenses, e.g. depth of field.

As an example for the second point, the ideal target pose for calibrating the nonlinear lens distortion (barrel, pincushion, tangential, etc.) is frontal-facing, covering the whole field of view, because it produces a large number of well-separated and aligned corners over the image, all with approximately the same degree of blur. However, this is exactly the worst pose you can use in order to estimate the field of view / focal length, as for that purpose you need to observe significant perspective foreshortening.

Likewise, it is possible to show that the location of the principal point is well constrained by a set of images showing the vanishing points of multiple pencils of parallel lines. This is important because that location is inherently confused by the component parallel to the image plane of the relative motion between camera and target. Thus the vanishing points help "guide" the optimizer's solution toward the correct one, in the common case where the target does translate w.r.t the camera.