0
votes

I am having trouble converting the output of solvePnP to a camera position in Unity. I have spent the last several days going through documentation, reading every question I could find about it, and trying all different approaches but still I am stuck.

Here's what I can do. I have a 3D object in the real world with known 2D-3D coresponsandance points. And I can use these and solvePNP to get the rvec,tvec. Then I can plot the 2D points and then on top of those I can plot the points found with projectPoints. These points line up pretty closely.

(success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
                                                              dist_coeffs, flags=cv2.cv2.SOLVEPNP_ITERATIVE)

print("Rotation Vector:\n" + str(rotation_vector))

print("Translation Vector:\n" + str(translation_vector))

(repro_points, jacobian) = cv2.projectPoints(model_points, rotation_vector,
                                                 translation_vector, camera_matrix, dist_coeffs)

original_img = cv2.imread(r"{}".format("og.jpg"))
for p in repro_points:
    cv2.circle(original_img, (int(p[0][0]), int(p[0][1])), 3, (255, 255, 255), -1)
    print(str(p[0][0]) + "-" + str(p[0][1]))

for p in image_points:
   cv2.circle(original_img, (int(p[0]), int(p[1])), 3, (0, 0, 255), -1)

cv2.imshow("og", original_img)
cv2.waitKey()

The code above will display my original image with the image_points and repro_points more or less lined up. Now I understand from reading that tvec and rvec cannot be used in Unity directly instead they represent the transformation matrix between the camera space and the object space such that you can translate points between the two spaces with the following formula.

enter image description here

Now I want to take what I solvepnp has given me, the rvec and tvec and determine how to rotate and translate my unity camera to line up in the same position as the original picture was taken. In Unity I start with the camera facing my object and with both of them at 0,0,0. So to try to keep things clear in my head I made another python script to try and convert Rvec and Tvec into unity position and rotations. This is based on advice I saw on opencv forum. First I get the rotation matrix from Rodrigues, transpose it and then swap the 2nd and 3rd row to swap y and z although I don't know if that is right. Then I rotate the matrix and finally negate it and multiply it by tvec to get the position. But this position does not line up with the real world.

def main(argv):
print("Start")

rvec = np.array([0.11160132, -2.67532422, -0.55994949])
rvec = rvec.reshape(3,1)
print("RVEC")
print(rvec)

tvec = np.array([0.0896325, -0.14819345, -0.36882839])
tvec = tvec.reshape(3,1)
print("TVEC")
print(tvec)

rotmat,_ = cv2.Rodrigues(rvec)
print("Rotation Matrix:")
print(rotmat)

#trans_mat = cv2.hconcat((rotmat, tvec))
#print("Transformation Matrix:")
#print(trans_mat)

#transpose the transformation matrix
transposed_mat = np.transpose(rotmat)
print("Transposed Mat: ")
print(transposed_mat)

#swap rows 1 & 2
swap = np.empty([3, 3])
swap[0] = rotmat[0]
swap[1] = rotmat[2]
swap[2] = rotmat[1]

print("SWAP")
print(swap)

R = np.rot90(swap)
#this is supposed to be the rotation matrix for the camera
print("R:")
print(R)

#translation matrix
#they say negative matrix is 1's on diagonals do they mean idenity matrix
#negativematrix = np.identity(3)

position = np.matmul(-R, tvec);
print("Position: ")
print(position)

The output of this code is:

Start
RVEC
[[ 0.11160132]
 [-2.67532422]
 [-0.55994949]]
TVEC
[[ 0.0896325 ]
 [-0.14819345]
 [-0.36882839]]
Rotation Matrix:
[[-0.91550667  0.00429232 -0.4022799 ]
 [-0.15739624  0.91641547  0.36797976]
 [ 0.37023502  0.40020526 -0.83830888]]
Transposed Mat: 
[[-0.91550667 -0.15739624  0.37023502]
 [ 0.00429232  0.91641547  0.40020526]
 [-0.4022799   0.36797976 -0.83830888]]
SWAP
[[-0.91550667  0.00429232 -0.4022799 ]
 [ 0.37023502  0.40020526 -0.83830888]
 [-0.15739624  0.91641547  0.36797976]]
R:
[[-0.4022799  -0.83830888  0.36797976]
 [ 0.00429232  0.40020526  0.91641547]
 [-0.91550667  0.37023502 -0.15739624]]
Position: 
[[0.04754685]
 [0.39692311]
 [0.07887335]]

If I swap y and z here I could sort of see it being close but it is still not right.

To get the rotation I have been doing the following. And I also tried subtracting 180 from the y axis since in Unity my camera and object are facing one another. But this is not coming out right for me either.

rotation_mat, jacobian = cv2.Rodrigues(rotation_vector)

pose_mat = cv2.hconcat((rotation_mat, translation_vector))

tr = -np.matrix(rotation_mat).T * np.matrix(translation_vector)
print("TR_TR")
print(tr)

_, _, _, _, _, _, euler_angles = cv2.decomposeProjectionMatrix(pose_mat)

print("Euler:")
print(euler_angles)

I was feeling good this morning when I got the re-projected points to line up, but now I feel like I'm stuck in the mud. Any help is appreciated. Thank you.

1

1 Answers

1
votes

I had a similar problem when I was writing an AR application for Unity. I remember spending several days too, until I figured it out. I had a DLL written in C++ OpenCV which took an image from a webcam, detected an object in the image and found its pose. A front end written in C# Unity would call the DLL's functions and update the position and orientation of a 3D model accordingly.

Simplified versions of the C++ and Unity code are:

void getCurrentPose(float* outR, float* outT)
{
    cv::Mat Rvec;
    cv::Mat Tvec;

    cv::solvePnP(g_modelPoints, g_imagePoints, g_cameraMatrix, g_distortionParams, Rvec, Tvec, false, cv::SOLVEPNP_ITERATIVE);

    cv::Matx33d R;
    cv::Rodrigues(Rvec, R);
    
    cv::Point3d T;
    T.x = Tvec.at<double>(0, 0);
    T.y = Tvec.at<double>(1, 0);
    T.z = Tvec.at<double>(2, 0);

    // Uncomment to return the camera transformation instead of model transformation
/*  const cv::Matx33d Rinv = R.t();
    const cv::Point3d Tinv = -R.t() * T;
    
    R = Rinv;
    T = Tinv;
*/

    for(int i = 0; i < 9; i++)
    {
        outR[i] = (float)R.val[i];
    }
    
    outT[0] = (float)T.x;
    outT[1] = (float)T.y;
    outT[2] = (float)T.z;
}

and

public class Vision : MonoBehaviour {
    GameObject model = null;

    void Start() {
        model = GameObject.Find("Model");
    }

    void Update() {
        float[] r = new float[9];
        float[] t = new float[3];
        
        dll_getCurrentPose(r, t); // Get object pose from DLL
    
        Matrix4x4 R = new Matrix4x4();

        R.SetRow(0, new Vector4(r[0], r[1], r[2], 0));
        R.SetRow(1, new Vector4(r[3], r[4], r[5], 0));
        R.SetRow(2, new Vector4(r[6], r[7], r[8], 0));
        R.SetRow(3, new Vector4(0, 0, 0, 1));

        Quaternion Q = R.rotation;

        model.transform.SetPositionAndRotation(
            new Vector3(t[0], -t[1], t[2]),
            new Quaternion(-Q.x, Q.y, -Q.z, Q.w));
    }
}

It should be easy to port in Python and try it. Since you want the camera transformation, you should probably uncomment the commented lines in the C++ code. I don't know if this code will work for your case but maybe it is worth trying. Unfortunately I don't have Unity installed anymore to try things so I can't make any other suggestions.