I have two calibrated cameras looking at an overlapping scene. I am trying to estimate the pose of camera2 with respect to camera1 (because camera2 can be moving; but both camera1 and 2 will always have some features that are overlapping).
I am identifying features using SIFT, computing the fundamental matrix and eventually the essential matrix. Once I solve for R and t (one of the four possible solutions), I obtain the translation up-to-scale, but is it possible to somehow compute the translation in real world units? There are no objects of known size in the scene; but I do have the calibration data for both the cameras. I've gone through some info about Structure from Motion and stereo pose estimation, but the concept of scale and the correlation with real world translation is confusing me.
Thanks!