I am using google tango tablet to acquire point cloud data and RGB camera images. I want to create 3D scan of the room. For that i need to map 2D image pixels to point cloud point. I will be doing this with a lot of point clouds and corresponding images.Thus I need to write a code script which has two inputs 1. point cloud and 2. image taken from the same point in same direction and the script should output colored point cloud. How should i approach this & which platforms will be very simple to use?
2 Answers
Here is the math to map a 3D point v
to 2D pixel space in the camera image (assuming that v
already incorporates the extrinsic camera position and orientation, see note at bottom*):
// Project to tangent space.
vec2 imageCoords = v.xy/v.z;
// Apply radial distortion.
float r2 = dot(imageCoords, imageCoords);
float r4 = r2*r2;
float r6 = r2*r4;
imageCoords *= 1.0 + k1*r2 + k2*r4 + k3*r6;
// Map to pixel space.
vec3 pixelCoords = cameraTransform*vec3(imageCoords, 1);
Where cameraTransform
is the 3x3 matrix:
[ fx 0 cx ]
[ 0 fy cy ]
[ 0 0 1 ]
with fx
, fy
, cx
, cy
, k1
, k2
, k3
from TangoCameraIntrinsics
.
pixelCoords
is declared vec3
but is actually 2D in homogeneous coordinates. The third coordinate is always 1 and so can be ignored for practical purposes.
Note that if you want texture coordinates instead of pixel coordinates, that is just another linear transform that can be premultiplied onto cameraTransform
ahead of time (as is any top-to-bottom vs. bottom-to-top scanline addressing).
As for what "platform" (which I loosely interpreted as "language") is simplest, the native API seems to be the most straightforward way to get your hands on camera pixels, though it appears people have also succeeded with Unity and Java.
* Points delivered by TangoXYZij
already incorporate the depth camera extrinsic transform. Technically, because the current developer tablet shares the same hardware between depth and color image acquisition, you won't be able to get a color image that exactly matches unless both your device and your scene are stationary. Fortunately in practice, most applications can probably assume that neither the camera pose nor the scene changes enough in one frame time to significantly affect color lookup.
This answer is not original, it is simply meant as a convenience for Unity users who would like the correct answer, as provided by @rhashimoto, worked out for them. My contribution (hopefully) is providing code that reduces the normal 16 multiplies and 12 adds (given Unity only does 4x4 matrices) to 2 multiplies and 2 adds by dropping out all of the zero results. I ran a little under a million points through the test, checking each time that my calculations agreed with the basic matrix calculations - defined as the absolute difference between the two results being less than machine epsilon - I'm as comfortable with this as I can be knowing that @rhashimoto may show up and poke a giant hole in it :-)
If you want to switch back and forth, remember this is C#, so the USEMATRIXMATH define must appear at the beginning of the file.
Given there's only one Tango device right now, and I'm assuming the intrinsics are constant across all of the devices, I just dumped them in as constants, such that
fx = 1042.73999023438
fy = 1042.96997070313
cx = 637.273986816406
cy = 352.928985595703
k1 = 0.228532999753952
k2 = -0.663019001483917
k3 = 0.642908990383148
Yes they can be dumped in as constants, which would make things more readable, and C# is probably smart enough to optimize it out - however, I spent too much of my life in Agner Fogg's stuff, and will always be paranoid.
The commented out code at the bottom is for testing the difference, should you desire. You'll have to uncomment some other stuff, and comment out the returns if you want to test the results.
My thanks again to @rhashimoto, this is far far better than what I had
I have stayed true to his logic, remember these are pixel coordinates, not UV coordinates - he is correct that you can premultiply the transform to get normalized UV values, but since he schooled me on this once already, I will stick with exactly the math he presented before I fiddle with too much :-)
static public Vector2 PictureUV(Vector3 tangoDepthPoint)
{
Vector2 imageCoords = new Vector2(tangoDepthPoint.x / tangoDepthPoint.z, tangoDepthPoint.y / tangoDepthPoint.z);
float r2 = Vector2.Dot(imageCoords, imageCoords);
float r4 = r2*r2;
float r6 = r2*r4;
imageCoords *= 1.0f + 0.228532999753952f*r2 + -0.663019001483917f*r4 + 0.642908990383148f*r6;
Vector3 ic3 = new Vector3(imageCoords.x,imageCoords.y,1);
#if USEMATRIXMATH
Matrix4x4 cameraTransform = new Matrix4x4();
cameraTransform.SetRow(0,new Vector4(1042.73999023438f,0,637.273986816406f,0));
cameraTransform.SetRow(1, new Vector4(0, 1042.96997070313f, 352.928985595703f, 0));
cameraTransform.SetRow(2, new Vector4(0, 0, 1, 0));
cameraTransform.SetRow(3, new Vector4(0, 0, 0, 1));
Vector3 pixelCoords = cameraTransform * ic3;
return new Vector2(pixelCoords.x, pixelCoords.y);
#else
//float v1 = 1042.73999023438f * imageCoords.x + 637.273986816406f;
//float v2 = 1042.96997070313f * imageCoords.y + 352.928985595703f;
//float v3 = 1;
return new Vector2(1042.73999023438f * imageCoords.x + 637.273986816406f,1042.96997070313f * imageCoords.y + 352.928985595703);
#endif
//float dx = Math.Abs(v1 - pixelCoords.x);
//float dy = Math.Abs(v2 - pixelCoords.y);
//float dz = Math.Abs(v3 - pixelCoords.z);
//if (dx > float.Epsilon || dy > float.Epsilon || dz > float.Epsilon)
// UnityEngine.Debug.Log("Well, that didn't work");
//return new Vector2(v1, v2);
}
As one final note, do note the code he provided is GLSL - if you're just using this for pretty pictures, use it - this is for those that actually need to perform additional processing.