I'm trying to use OpenCV with python to track camera pose in a video stream. I have a sample of code that determines the pose between two images as a test environment.
The overall flow here is this:
Read in the images and convert to gray/resize.
Extract features with cv2 goodfeaturestotrack from both images.
- Use cv2 calcOpticalFlowPyrLK to find matching points.
- Convert the p1 points (starting image) to (x,y,z) with z for all points set as 0.
- Resolve cv2 PnPRansac to get the rotation and translation vectors.
- Convert angles from radians to degrees.
def function(mtx,dist):
#feature dictionary
feature_params = dict( maxCorners = 1000,
qualityLevel = 0.3,
minDistance = 7,
blockSize = 7 )
lk_params = dict( winSize = (15,15),
maxLevel = 2,
criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))
#image 1
image_1=cv2.imread("/Users/user/Desktop/Test_images/Test_image.jpg")
image_1=cv2.resize(image_1,(640,480))
gray_1=cv2.cvtColor(image_1,cv2.COLOR_BGR2GRAY)
p1=cv2.goodFeaturesToTrack(gray, mask = None, **feature_params)
#image read in
image_2=cv2.imread("/Users/user/Desktop/Test_images/Test_image.jpg")
image_2=cv2.resize(image_2,(640,480))
gray_2 = cv2.cvtColor(image_2,cv2.COLOR_BGR2GRAY)
p2, st, err = cv2.calcOpticalFlowPyrLK(gray_1, gray_2, p1, None, **lk_params)
#convert the old points to 3d
zeros=np.zeros([len(p1),1],dtype=np.float)
old_3d=np.dstack((p1, zeros))
#get rotational and translation vector
blank,rvecs, tvecs, inliers = cv2.solvePnPRansac(old_3d, p2, mtx, dist)
rad=180/math.pi
roll=rvecs[0]*rad
pitch=rvecs[1]*rad
yaw=rvecs[2]*rad
print(roll)
print(pitch)
print(yaw)
print(tvecs)
function(mtx,dist)
rvec (roll, pitch and yaw)
[ 0.35305807]
[ 2.95965708]
[ 0.10954427]
tvec (x,y,x ???)
[[ -668.42397254]
[ -387.32180857]
[ 1180.94652875]]
Given the fact that I am using exactly the same image to run this sample calculation I was expecting rotation and translation vectors to be very close to zero. However they are quite high, take a look at the sample output below. Additionally with different images with a known translation the vectors are very wrong.
The question at hand is my method sound? Have I approached this problem right? Have I matched the points correctly? Is this level of noise normal or is there something I can do about it?