14
votes

I've put together a stereo cam rig and am having trouble using it to produce a good disparity map. Here's an example of two rectified images and the disparity map I produced with them:

Rectified images with disparity map

As you can see, the results are pretty bad. Changing the StereoBM's settings doesn't change much.

The setup

  • Both cameras are the same model and connect to my computer with USB.
  • They are fixed to a rigid wooden board so that they don't move. I aligned them as best I could, but of course it's not perfect. They are unable to move, so their positions during and after calibration are the same.
  • I calibrated the stereo pair using OpenCV and am using OpenCV's StereoBM class to produce the disparity map.
  • It's probably not that relevant, but I'm coding in Python.

Problems I could imagine

I'm doing this for the first time, so I'm far from being an expert, but I'm guessing the problem is in the calibration or in the stereo rectification, rather than the computation of the disparity map. I've tried all permutations of settings for the StereoBM and, although I get different results, they're all like the disparity map shown above: Patches of black and white.

This idea is further supported by the fact that, as I understand it, stereo rectification should align all points on each picture so that they are connected by a straight (in my case horizontal) line. If I examine both rectified pictures next to each other, it's imediately obvious that this isn't the case. Corresponding points are much higher on the right picture than on the left. I'm not sure whether the calibration or the rectification is the problem, though.

The code

The actual code is wrapped up in objects - in case you're interested in seeing it in its entirety, it's available on GitHub. Here is a simplified example of what's actually run (of course in the real code I calibrate using more than just 2 pictures):

import cv2
import numpy as np

## Load test images
# TEST_IMAGES is a list of paths to test images
input_l, input_r = [cv2.imread(image, cv2.CV_LOAD_IMAGE_GRAYSCALE)
                    for image in TEST_IMAGES]
image_size = input_l.shape[:2]

## Retrieve chessboard corners
# CHESSBOARD_ROWS and CHESSBOARD_COLUMNS are the number of inside rows and
# columns in the chessboard used for calibration
pattern_size = CHESSBOARD_ROWS, CHESSBOARD_COLUMNS
object_points = np.zeros((np.prod(pattern_size), 3), np.float32)
object_points[:, :2] = np.indices(pattern_size).T.reshape(-1, 2)
# SQUARE_SIZE is the size of the chessboard squares in cm
object_points *= SQUARE_SIZE
image_points = {}
ret, corners_l = cv2.findChessboardCorners(input_l, pattern_size, True)
cv2.cornerSubPix(input_l, corners_l,
                 (11, 11), (-1, -1),
                 (cv2.TERM_CRITERIA_MAX_ITER + cv2.TERM_CRITERIA_EPS,
                  30, 0.01))
image_points["left"] = corners_l.reshape(-1, 2)
ret, corners_r = cv2.findChessboardCorners(input_r, pattern_size, True)
cv2.cornerSubPix(input_r, corners_r,
                 (11, 11), (-1, -1),
                 (cv2.TERM_CRITERIA_MAX_ITER + cv2.TERM_CRITERIA_EPS,
                  30, 0.01))
image_points["right"] = corners_r.reshape(-1, 2)

## Calibrate cameras
(cam_mats, dist_coefs, rect_trans, proj_mats, valid_boxes,
 undistortion_maps, rectification_maps) = {}, {}, {}, {}, {}, {}, {}
criteria = (cv2.TERM_CRITERIA_MAX_ITER + cv2.TERM_CRITERIA_EPS,
            100, 1e-5)
flags = (cv2.CALIB_FIX_ASPECT_RATIO + cv2.CALIB_ZERO_TANGENT_DIST +
         cv2.CALIB_SAME_FOCAL_LENGTH)
(ret, cam_mats["left"], dist_coefs["left"], cam_mats["right"],
 dist_coefs["right"], rot_mat, trans_vec, e_mat,
 f_mat) = cv2.stereoCalibrate(object_points,
                              image_points["left"], image_points["right"],
                              image_size, criteria=criteria, flags=flags)
(rect_trans["left"], rect_trans["right"],
 proj_mats["left"], proj_mats["right"],
 disp_to_depth_mat, valid_boxes["left"],
 valid_boxes["right"]) = cv2.stereoRectify(cam_mats["left"],
                                           dist_coefs["left"],
                                           cam_mats["right"],
                                           dist_coefs["right"],
                                           image_size,
                                           rot_mat, trans_vec, flags=0)
for side in ("left", "right"):
    (undistortion_maps[side],
     rectification_maps[side]) = cv2.initUndistortRectifyMap(cam_mats[side],
                                                           dist_coefs[side],
                                                           rect_trans[side],
                                                           proj_mats[side],
                                                           image_size,
                                                           cv2.CV_32FC1)

## Produce disparity map
rectified_l = cv2.remap(input_l, undistortion_maps["left"],
                        rectification_maps["left"],
                        cv2.INTER_NEAREST)
rectified_r = cv2.remap(input_r, undistortion_maps["right"],
                        rectification_maps["right"],
                        cv2.INTER_NEAREST)
cv2.imshow("left", rectified_l)
cv2.imshow("right", rectified_r)
block_matcher = cv2.StereoBM(cv2.STEREO_BM_BASIC_PRESET, 0, 5)
disp = block_matcher.compute(rectified_l, rectified_r, disptype=cv2.CV_32F)
cv2.imshow("disparity", disp)

What's going wrong here?

1

1 Answers

17
votes

It turned out that the problem was the visualization and not the data itself. Somewhere I read that cv2.reprojectImageTo3D required a disparity map as floating point values, which is why I was requesting cv2.CV_32F from block_matcher.compute.

Reading the OpenCV documentation more carefully has led me to think that I was thinking this in error, and I'd actually like to work with integers than floats for the sake of speed, but the documentation for cv2.imshow wasn't clear on what it does with 16 bit signed integers (as compared to 16 bit unsigned), so for the visualization I'm leaving the values as floats.

The documentation of cv2.imshow reveals that 32 bit floating point values are assumed to be between 0 and 1, so they're multiplied by 255. 255 is the saturation point at which a pixel is displayed as white. In my case, this assumption produced a binary map. I manually scaled it to the range of 0-255 and then divided it by 255 in order to cancel out the fact that OpenCV does the same as well. I know, it's a horrible operation, but I'm only doing it in order to tune my StereoBM offline so performance is uncritical. The solution looks like this:

# Other code as above
disp = block_matcher.compute(rectified_l, rectified_r, disptype=cv2.CV_32F)
norm_coeff = 255 / disp.max()
cv2.imshow("disparity", disp * norm_coeff / 255)

Then the disparity map looks okay.