1
votes

I'm trying to use OpenCV with python to track camera pose in a video stream. I have a sample of code that determines the pose between two images as a test environment.

The overall flow here is this:

  1. Read in the images and convert to gray/resize.

  2. Extract features with cv2 goodfeaturestotrack from both images.

  3. Use cv2 calcOpticalFlowPyrLK to find matching points.
  4. Convert the p1 points (starting image) to (x,y,z) with z for all points set as 0.
  5. Resolve cv2 PnPRansac to get the rotation and translation vectors.
  6. Convert angles from radians to degrees.
    def function(mtx,dist):

            #feature dictionary
            feature_params = dict( maxCorners = 1000,
                               qualityLevel = 0.3,
                               minDistance = 7,
                               blockSize = 7 )
            lk_params = dict( winSize  = (15,15),
                          maxLevel = 2,
                          criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))


            #image 1
            image_1=cv2.imread("/Users/user/Desktop/Test_images/Test_image.jpg")
            image_1=cv2.resize(image_1,(640,480))
            gray_1=cv2.cvtColor(image_1,cv2.COLOR_BGR2GRAY)
            p1=cv2.goodFeaturesToTrack(gray, mask = None, **feature_params)

            #image read in
            image_2=cv2.imread("/Users/user/Desktop/Test_images/Test_image.jpg")
            image_2=cv2.resize(image_2,(640,480))
            gray_2 = cv2.cvtColor(image_2,cv2.COLOR_BGR2GRAY)
            p2, st, err = cv2.calcOpticalFlowPyrLK(gray_1, gray_2, p1, None, **lk_params)

            #convert the old points to 3d
            zeros=np.zeros([len(p1),1],dtype=np.float)
            old_3d=np.dstack((p1, zeros))

            #get rotational and translation vector
            blank,rvecs, tvecs, inliers = cv2.solvePnPRansac(old_3d, p2, mtx, dist)

            rad=180/math.pi

            roll=rvecs[0]*rad
            pitch=rvecs[1]*rad
            yaw=rvecs[2]*rad

            print(roll)
            print(pitch)
            print(yaw)
            print(tvecs)

        function(mtx,dist)



   rvec (roll, pitch and yaw)
    [ 0.35305807]
    [ 2.95965708]
    [ 0.10954427]

    tvec (x,y,x ???)
    [[ -668.42397254]
     [ -387.32180857]
     [ 1180.94652875]]

Given the fact that I am using exactly the same image to run this sample calculation I was expecting rotation and translation vectors to be very close to zero. However they are quite high, take a look at the sample output below. Additionally with different images with a known translation the vectors are very wrong.

The question at hand is my method sound? Have I approached this problem right? Have I matched the points correctly? Is this level of noise normal or is there something I can do about it?

1
Looks not correct for me. To estimate the full camera pose, that is the rotation and the translation that allow to transform a 3D point expressed in the object frame to a 3D point expressed in the camera frame, you need the correspondences 3D / 2D points. The 3D points must be points expressed in the object frame. The 2D points must be the corresponding 3D points projected in the current image plane according to the (estimated) camera pose and the intrinsic + distortion parameters. - Catree
Okay I'm following what you are saying, but how can I do that? What would the workflow look like? What OpenCV function would do that? - Jake3991

1 Answers

0
votes

You cannot use PnP if you don't know the 3D structure, that is, the 3D coordinates of the points that you have matched.

If you know the camera intrinsics, you can, however, estimate the homography (for a planar scene) or the essential matrix (for a general scene), and then decompose it to obtain the rotation and a translation up to scale. You can then do a bundle adjustment to refine the poses thus found.

If this is a one-shot project, it may be faster to do use a graphics environment like Blender, rather than code your own solution. See a matchmove tutorial here or here