0
votes

Right now, I am trying to determine a transform matrix in the Kinect V2.0 framework. My end goal is to end up with the translation and rotation matrix necessary to convert my point cloud, which is relative to the depth sensor, into a point cloud relative to the color camera.

I know that I can manually calibrate (via OpenCV, etc.), to determine the transform, but I would like the actual camera matrix. I use the call MapDepthFrameToCameraSpaceUsingIntPtr, so I know that there is an internal understanding (aka Matrix transform) between depth space, and color space.

Do anyone know how to extract, or if there exists a matrix inside the Kinect v2 API, that they use internally for the MapDepthFrameToCameraSpaceUsingIntPtr call? Or, if there is a way to translate a point cloud image frame into color camera space?

2

2 Answers

0
votes

Probably they know the rotation and translation matrix, and the color camera parameters. Unfortunately as far as you use Microsoft SDK they don't expose these data (only depth camera parameters are public). Either you calibrate the cameras or you use the look-up table provided.

0
votes

What you are trying to do (transformation) is called Registration. I have explained here very clearly how to do the Registration.

I know that I can manually calibrate (via OpenCV, etc.), to determine the transform, but I would like the actual camera matrix.

calibrating your camera is the only way to get the most accurate camera matrix of your Kinect sensor since every Kinect sensor's camera matrix differs from each other with little margin. But it will make a significant difference once you build your point cloud.

Do anyone know how to extract, or if there exists a matrix inside the Kinect v2 API, that they use internally for the MapDepthFrameToCameraSpaceUsingIntPtr call?

You can extract part of the matrix, but not all. Something very impotenent is MapDepthFrameToCameraSpaceUsingIntPtrnot processed in your CPU. It is calculated inside a chip in kinect hardware itself. The values of the matrix are embedded into the chip itself. The reason for this is that, think about how many calculation has to be done for this API call. E.g. Kinect frame rate in 30 frames per second and each color frame has 1920 x 1080 pixels and depth frame has 512 x 424 pixels. at least

30 x 512 x 424 = 6,512,640 calculations per second.

You can't build the point cloud in real world coordinate space without knowing the camera matrix. If you build the point cloud directly using depth image coordinates, then that point cloud in depth space.

I have developed a prototype for 3D interaction with real time point cloud visualization.

You can check out my repository VRInteraction.

Demo video

Calibrated color and depth camera matrix

As you can see the right hand side of the video, it is a real time 3D point cloud. I achieve this using CUDA (GPU accelaration) by Registering depth frame to color frame and building RGBXYZ point cloud.