I render a 3D mesh model using OpenGL with perspective camera – gluPerspective(fov, aspect, near, far).
Then I use rendered image in a computer vision algorithm.
At some point that algorithm requires camera matrix K (along with several vertices on the model and their corresponding projections) in order to estimate camera position: rotation matrix R and translation vector t. I can estimate R and t by using any algorithm which solves Perspective-n-Point problem.
I construct K from the OpenGL projection matrix (see how here)
K = [fX, 0, pX | 0, fY, pY | 0, 0, 1]
If I want to project a model point 'by hand' I can compute:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] / X_proj[3]
y_pixel = X_proj[2] / X_proj[3]
Anyway, I pass this camera matrix in a PnP algorithm and it works just fine.
But then I had to change perspective projection to orthographic one. As far as I understand when using orthographic projection the camera matrix becomes:
K = [1, 0, 0 | 0, 1, 0 | 0, 0, 0]
So I changed gluPerspective to glOrtho. Following the same way I constructed K from OpenGL projection matrix, and it turned out that fX and fY are not ones but 0.0037371. Is this a scaled orthographic projection or what?
Moreover, in order to project model vertices 'by hand' I managed to do the following:
X_proj = K*(R*X_model + t)
x_pixel = X_proj[1] + width / 2
y_pixel = X_proj[2] + height / 2
Which is not what I expected (that plus width and hight divided by 2 seems strange...). I tried to pass this camera matrix to POSIT algorithm to estimate R and t, and it doesn't converge. :(
So here are my questions:
- How to get orthographic camera matrix from OpenGL?
- If the way I did it is correct then is it true orthographic? Why POSIT doesn't work?