What you are trying to do (transformation) is called Registration. I have explained here very clearly how to do the Registration.
I know that I can manually calibrate (via OpenCV, etc.), to determine
the transform, but I would like the actual camera matrix.
calibrating your camera is the only way to get the most accurate camera matrix of your Kinect sensor since every Kinect sensor's camera matrix differs from each other with little margin. But it will make a significant difference once you build your point cloud.
Do anyone know how to extract, or if there exists a matrix inside the
Kinect v2 API, that they use internally for the
MapDepthFrameToCameraSpaceUsingIntPtr call?
You can extract part of the matrix, but not all. Something very impotenent is MapDepthFrameToCameraSpaceUsingIntPtr
not processed in your CPU. It is calculated inside a chip in kinect hardware itself. The values of the matrix are embedded into the chip itself. The reason for this is that, think about how many calculation has to be done for this API call. E.g. Kinect frame rate in 30 frames per second and each color frame has 1920 x 1080 pixels and depth frame has 512 x 424 pixels. at least
30 x 512 x 424 = 6,512,640
calculations per second.
You can't build the point cloud in real world coordinate space without knowing the camera matrix. If you build the point cloud directly using depth image coordinates, then that point cloud in depth space.
I have developed a prototype for 3D interaction with real time point cloud visualization.
You can check out my repository VRInteraction.
Demo video
Calibrated color and depth camera matrix
As you can see the right hand side of the video, it is a real time 3D point cloud. I achieve this using CUDA (GPU accelaration) by Registering depth frame to color frame and building RGBXYZ point cloud.