I have a C++ project, where I am using OpenCV and Libfreenect. I do not want to include something as big and heavy as OpenNI and create OpenCV installation dependency in the process. I want to use the calibration information provided here to undistort and align the RGB and depth images.
Undistorting the images individually based on camera matrix and distortion coefficients was easy enough. But now I am confused as to how I could use the rectification and projection matrices to align the RGB and depth image, so that they essentially show me the same things from the same perspective. After searching around for quite some time, I cannot establish a flow of how it should work with OpenCV. It is a vague estimate that reprojectImageTo3D() and warpPerspective() might be used, but I am not sure how.
How could I approach this problem? I am using the old XBOX360 Kinect (with 0-2047 raw disparity value range).
UPDATE
Here is the partial code I have written so far:
// I use callback functions to get RGB (CV_8UC3) and depth (CV_16UC1)
// I undistort them and call the following method
void AlignImages(cv::Mat& pRGB, cv::Mat& pDepth) {
rotationMat = (cv::Mat_<double_t>(3,3) << 9.9984628826577793e-01, 1.2635359098409581e-03, -1.7487233004436643e-02, -1.4779096108364480e-03, 9.9992385683542895e-01, -1.2251380107679535e-02, 1.7470421412464927e-02, 1.2275341476520762e-02, 9.9977202419716948e-01);
translationMat = (cv::Mat_<double_t>(3,1) << 1.9985242312092553e-02, -7.4423738761617583e-04, -1.0916736334336222e-02);
// make a copy in float to convert raw depth data to physical distance
cv::Mat tempDst;
pDepth.convertTo(tempDst, CV_32F);
// create a 3 channel image of precision double for the 3D points
cv::Mat tempDst3D = cv::Mat(cv::Size(640, 480), CV_64FC3, double(0));
float_t* tempDstData = (float_t*)tempDst.data;
double_t* tempDst3DData = (double_t*)tempDst3D.data;
size_t pixelSize = tempDst.step / sizeof(float_t);
size_t pixel3DSize = tempDst3D.step / sizeof(double_t);
for (int row=0; row < tempDst.rows; row++) {
for (int col=0; col < tempDst.cols; col++) {
// convert raw depth values to physical distance (in metres)
float_t& pixel = tempDstData[pixelSize * row + col];
pixel = 0.1236 * tanf(pixel/2842.5 + 1.1863);
// reproject physical distance values to 3D space
double_t& pixel3D_X = tempDst3DData[pixel3DSize * row + col];
double_t& pixel3D_Y = tempDst3DData[pixel3DSize * row + col +1];
double_t& pixel3D_Z = tempDst3DData[pixel3DSize * row + col + 2];
pixel3D_X = (row - 3.3930780975300314e+02) * pixel / 5.9421434211923247e+02;
pixel3D_Y = (col - 2.4273913761751615e+02) * pixel / 5.9104053696870778e+02;
pixel3D_Z = pixel;
}
}
tempDst3D = rotationMat * tempDst3D + translationMat;
}
I have directly used the numbers instead of assigning them to variables, but that should not be a problem in understanding the logic. At this point, I am supposed to do the following:
P2D_rgb.x = (P3D'.x * fx_rgb / P3D'.z) + cx_rgb
P2D_rgb.y = (P3D'.y * fy_rgb / P3D'.z) + cy_rgb
But I do not understand how I am to do it, exactly. Perhaps I am going in the wrong direction altogether. But I cannot find any example of this being done.
tempDst3DData
buffer. It should betempDst3DData[3*pixel3DSize*row + 3*col + channel]
. Concerning your updated question, I'll edit my answer to try and make it clearer. โ BConicrow
andcol
in yourpixel3D_X
andpixel3D_Y
expressions. โ BConic