1
votes

I have a stereoscopic camera set up with two webcams that I am using with Matlab. I calibrate the cameras, and get the stereoParams.

Then, I want a user to be able to select a point in a picture, and get the real world point location in the image. I know for this I need the baseline, the focal length, and the pixel disparity. I have the pixel disparity, but how do I get the baseline and focal length? Can baseline be calculated from the stereoParams?

3

3 Answers

1
votes

I am not familiar with the Matlab stereo camera calibration functions, but in general, once you calibrate each camera, and find the fundamental matrix, you should be able to do the following:

  1. Set one of the images as reference and rectify the other image so that disparity search proceeds along horizontal lines in the image
  2. From the pixel disparity, you can calculate the real world depth by the relation z = fB/d, where f is the focal length, B is the baseline, and d is the disparity. It is very important to mind the units! if d is in pixels, then f must also be in pixels if you want z to be in the units of the baseline (e.g. centimeters)
  3. The baseline is the distance between the optical centers of the cameras. It should be available from the matlab stereoParameters.translationofCamera2
  4. The focal length is an intrinsic parameter of each camera. I assumed equal focal lengths above, but with webcams, this is not guaranteed. You should be able to extract the focal length from the matlab cameraParameters.IntrinsicMatrix. The focal length is related to the alpha parameters in the intrinsic matrix (see this Wikipedia entry for explanations)
0
votes

The "pixel" disparity is defined in rectified image coordinates. However, as your real cameras will not normally be exactly parallel and row-aligned, there is a non-identity transformation that rectifies your input camera images. Therefore you need to "undo" the rectification in order to find the pixel in the other image corresponding to a given one. The procedure is as follows:

  1. User selects a point in, say, the left image, giving you a pair of image coordinates (xl, yl).
  2. Apply the left rectification transform to them, obtaining their corresponding left rectified image coordinates. If you are using one of the common linear rectification methods, this is (xlr, ylr, wlr)' = Hlr * (xl, yl, 1)' , where Hlr is the left rectification homography.
  3. Look up the disparity at map at (xlr / wlr, ylr / wlr), obtaining the pixel's disparity value d (here I assume that your stereo algorithm yields a left-to-right disparity map for the X coordinate).
  4. The matching point in the right rectified image is then (xrr, yrr) = (d + xlr / wlr, ylr / wlr)
  5. Apply the inverse of the right rectification transform to get the corresponding pixel in right image coordinates (xr, yr, wr)' = Hrr^-1 * (xrr, yrr, 1)'

Note that all these operations need be performed once only for each pixel, and can be cached. In other words, you can pre-compute a "rectified" 2-channel disparity map that for each pixel yields an offset from its coordinates in one image to the corresponding pixel in the other image. The map itself can be stored as an image, whose channel type depends on the disparity range - usually short integer will be enough, as it can represent offsets of +- 32K pixels.

0
votes

You can use the reconstructScene function, which will give you the 3D world coordinates for every pixel with valid disparity. In this example you look up the 3D coordinates of the centroid of the detected person.