3
votes

I would like to compare my results of visual Odometry with the groundtruth provided by the KITTI dataset. For each frame in the groundthruth, I have a projection matrix. For example:

1.000000e+00 9.043683e-12 2.326809e-11 1.110223e-16 9.043683e-12 1.000000e+00 2.392370e-10 2.220446e-16 2.326810e-11 2.392370e-10 9.999999e-01 -2.220446e-16

Here the instructions provided by the readme:

Row i represents the i'th pose of the left camera coordinate system (i.e., z pointing forwards) via a 3x4 transformation matrix. The matrices are stored in row aligned order (the first entries correspond to the first row), and take a point in the i'th coordinate system and project it into the first (=0th) coordinate system. Hence, the translational part (3x1 vector of column 4) corresponds to the pose of the left camera coordinate system in the i'th frame with respect to the first (=0th) frame

But I don't know how to produce the same kind of data for me. What I have for each frame in my case:

  • The Tf transformation from the init_camera (the fix one from the (0,0,0)) to the left camera which is moving. So I have the translation vector and the quaternion rotation.
  • The odometry data: the pose and the twist
  • Camera calibration parameters

With those data, How I compare with the groundtruth ? So I need to find the projection matrix from the data above but don't know how to do it.

In a big picture I would like to obtain a projection matrix or to know how to decompose the projections matrix provided by the ground truth in order to compare the transformation with my data.

Can someone help me ?

Thank

2

2 Answers

5
votes

Are you sure you want the projection matrix? A camera projection matrix is typically a 3x4 matrix that projects (homogenous) points in R3 to (homogenous) points in R2 in the image plane up to some scale (see the wikipedia entry). It sounds like you are interested in comparing your computed visual odometry to the ground truth odometry provided on the KITTI website; in this case, you would be comparing the rigid transformation matrices from your VO estimation to the KITTI ground truth transformation.

If you are using the "raw" datasets, the "ground truth" is provided as the OXTS data records - i.e. the combined IMU and GPS data. This data is in the global frame, and will take a bit more work to compare to your data. However, it sounds like you are using the odometry benchmark data; the ground truth transformations are already in the frame of the left camera, which should make the task a bit easier (and that is what I'll discuss).

Since you haven't specified a language I'll speak more generally here, but ROS does provide tools in C++ (tf and Eigen) and Python (transformations.py) to perform tasks such as converting from quaternion to rotation matrices...

You have t and q, the translation and rotation represented as a quaternion. You can convert the quaternion to a rotation matrix (usually, 'sxyz' form) and vice versa. The KITTI data is specified as a 3x4 matrix and this is the rotation matrix concatenated with the translation vector (i.e. the 4th column is tgt).

r11 r12 r13 t1
r21 r22 r23 t2
r31 r32 r33 t3

You can simply compute the translation error by computing the L2 norm: ||t - tgt||. Computing the error in the rotation is a little bit more difficult; one way to do it is to use a function such as QuaternionBase::angularDistance() from Eigen, since both measurements should be in the same coordinate frame. To do this, you need to transform the ground truth rotation matrix into a quaternion, using either Eigen or the transformations.py library.

Keep in mind this is the error in the odometry frame - i.e. the error of your i-th estimated pose relative to the i-th ground truth pose in the frame of the initial pose. Sometimes, it is also useful to compare the average error frame-to-frame as well, especially since odometry tends to drift significantly over time, even if the algorithm is relatively accurate between frames on average.

In summary:

  • convert rotation matrices to quaternions to compute angular error (pay attention to the coordinate frames),
  • and use the formula ||t - tgt|| to compute the translation error.
  • again, pay attention to your coordinate frames.
0
votes

I'm not very practical with computer vision, but have a look at sensor_msgs/CameraInfo Message and image_geometry::PinholeCameraModel Class reference.

It should be somthing like this:

void imageCallback([...], const sensor_msgs::CameraInfoConstPtr& info_msg) {
  // initialize cam_model with sensor_msgs::CameraInfo
  image_geometry::PinholeCameraModel cam_model_;
  cam_model_.fromCameraInfo(info_msg);

  // get the projection matrix
  cv::Matx34d projection_matrix;
  projection_matrix = cam_model_.projectionMatrix();
}