6
votes

I am using opencv to calibrate my webcam. So, what I have done is fixed my webcam to a rig, so that it stays static and I have used a chessboard calibration pattern and moved it in front of the camera and used the detected points to compute the calibration. So, this is as we can find in many opencv examples (https://docs.opencv.org/3.1.0/dc/dbb/tutorial_py_calibration.html)

Now, this gives me the camera intrinsic matrix and a rotation and translation component for mapping each of these chessboard views from the chessboard space to world space.

However, what I am interested in is the global extrinsic matrix i.e. once I have removed the checkerboard, I want to be able to specify a point in the image scene i.e. x, y and its height and it gives me the position in the world space. As far as I understand, I need both the intrinsic and extrinsic matrix for this. How should one proceed to compute the extrinsic matrix from here? Can I use the measurements that I have already gathered from the chessboard calibration step to compute the extrinsic matrix as well?

1

1 Answers

6
votes

Let me place some context. Consider the following picture, (from https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html):

enter image description here

The camera has "attached" a rigid reference frame (Xc,Yc,Zc). The intrinsic calibration that you successfully performed allows you to convert a point (Xc,Yc,Zc) into its projection on the image (u,v), and a point (u,v) in the image to a ray in (Xc,Yc,Zc) (you can only get it up to a scaling factor).

In practice, you want to place the camera in an external "world" reference frame, let's call it (X,Y,Z). Then there is a rigid transformation, represented by a rotation matrix, R, and a translation vector T, such that:

|Xc|    |X|
|Yc|= R |Y| + T
|Zc|    |Z|

That's the extrinsic calibration (which can be written also as a 4x4 matrix, that's what you call the extrinsic matrix).

Now, the answer. To obtain R and T, you can do the following:

  1. Fix your world reference frame, for example the ground can be the (x,y) plane, and choose an origin for it.

  2. Set some points with known coordinates in this reference frame, for example, points in a square grid in the floor.

  3. Take a picture and get the corresponding 2D image coordinates.

  4. Use solvePnP to obtain the rotation and translation, with the following parameters:

    • objectPoints: the 3D points in the world reference frame.
    • imagePoints: the corresponding 2D points in the image in the same order as objectPoints.
    • cameraMatris: the intrinsic matrix you already have.
    • distCoeffs: the distortion coefficients you already have.
    • rvec, tvec: these will be the outputs.
    • useExtrinsicGuess: false
    • flags: you can use CV_ITERATIVE
  5. Finally, get R from rvec with the Rodrigues function.

You will need at least 3 non-collinear points with corresponding 3D-2D coordinates for solvePnP to work (link), but more is better. To have good quality points, you could print a big chessboard pattern, put it flat in the floor, and use it as a grid. What's important is that the pattern is not too small in the image (the larger, the more stable your calibration will be).

And, very important: for the intrinsic calibration, you used a chess pattern with squares of a certain size, but you told the algorithm (which does kind of solvePnPs for each pattern), that the size of each square is 1. This is not explicit, but is done in line 10 of the sample code, where the grid is built with coordinates 0,1,2,...:

objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)

And the scale of the world for the extrinsic calibration must match this, so you have several possibilities:

  1. Use the same scale, for example by using the same grid or by measuring the coordinates of your "world" plane in the same scale. In this case, you "world" won't be at the right scale.

  2. Recommended: redo the intrinsic calibration with the right scale, something like:

    objp[:,:2] = (size_of_a_square*np.mgrid[0:7,0:6]).T.reshape(-1,2)

    Where size_of_a_square is the real size of a square.

  3. (Haven't done this, but is theoretically possible, do it if you can't do 2) Reuse the intrinsic calibration by scaling fx and fy. This is possible because the camera sees everything up to a scale factor, and the declared size of a square only changes fx and fy (and the T in the pose for each square, but that's another story). If the actual size of a square is L, then replace fx and fy Lfx and Lfy before calling solvePnP.