3
votes

I am trying to create a dataset of images of objects at different poses, where each image is annotated with camera pose (or object pose).

If, for example, I have a world coordinate system and I place the object of interest at the origin and place the camera at a known position (x,y,z) and make it face the origin. Given this information, how can I calculate the pose (rotation matrix) for the camera or for the object.

I had one idea, which was to have a reference coordinate i.e. (0,0,z') where I can define the rotation of the object. i.e. its tilt, pitch and yaw. Then I can calculate the rotation from (0,0,z') and (x,y,z) to give me a rotation matrix. The problem is, how to now combine the two rotation matrices?

BTW, I know the world position of the camera as I am rendering these with OpenGL from a CAD model as opposed to physically moving a camera around.

1

1 Answers

1
votes

The homography matrix maps between homogeneous screen coordinates (i,j) to homogeneous world coordinates (x,y,z).

homogeneous coordinates are normal coordinates with a 1 appended. So (3,4) in screen coordinates is (3,4,1) as homogeneous screen coordinates.

If you have a set of homogeneous screen coordinates, S and their associated homogeneous world locations, W. The 4x4 homography matrix satisfies

S * H = transpose(W)

So it boils down to finding several features in world coordinates you can also identify the i,j position in screen coordinates, then doing a "best fit" homography matrix (openCV has a function findHomography)

Whilst knowing the camera's xyz provides helpful info, its not enough to fully constrain the equation and you will have to generate more screen-world pairs anyway. Thus I don't think its worth your time integrating the cameras position into the mix.

I have done a similar experiment here: http://edinburghhacklab.com/2012/05/optical-localization-to-0-1mm-no-problemo/