How to get Bird's Eye View from KITTI by Projection Matrix?

Question

The goal is to get the Bird's Eye View from KITTI images (dataset), and I have the Projection Matrix (3x4).

There are many ways to generate transformation matrices. For Bird's Eye View I have read some kind math expressions, like:

H12 = H2*H1-1=ARA-1=P*A-1 in OpenCV - Projection, homography matrix and bird's eye view

and x = Pi * Tr * X in kitti dataset camera projection matrix

but none of these options worked for my purpose.

PYTHON CODE

import numpy as np import cv2

image = cv2.imread('Data/RGB/000007.png')

maxHeight, maxWidth = image.shape[:2]

M has 3x4 dimensions

M = np.array(([721.5377, 0.0, 609.5593, 44.85728], [0.0, 721.5377, 72.854, 0.2163791], [0.0, 0.0, 1.0, .002745884]))

Here It's necessary a M matrix with 3x3 dimensions

warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))

show the original and warped images

cv2.imshow("Original", image)

cv2.imshow("Warped", warped)

cv2.waitKey(0)

I need to know how to manage the Projection Matrix for getting Bird's Eye View.

So far, everything I've tried throws warped images at me, without information even close to what I need.

This is a example of image from the KITTI database.

This is other example of image from the KITTI database.

On the left, images are shown detecting cars in 3D (above) and 2D (below). On the right is the Bird's Eye View that I want to obtain. Therefore, I need to obtain the transformation matrix to transform the coordinates of the boxes that delimit the cars.

can you show one of those kitti bird's eye view images? To get an idea of what you want ro achive — Micka
you want to get the image on the right? Just define 4 points on the ground plane of your camera images and corresponding 4 points on the bird's eye image and getPerspectiveTransform to get the conversion. — Micka
if you have the boxes in 3d coordinates, just remove the z coordinate to get an orthogonal projection. Afterwards scale and translate the space to your desired image space. — Micka
I do not want to use the 4-point method because it is miles of frames that I must process, each frame offers its own Projection Matrix values. That's why I want to use that matrix, I guess that way I will get more accurate results. — VíctorV

Tobias Tobias · Accepted Answer · 2019-11-02T17:39:39

Here is my code to manually build a bird's eye view transform:

cv::Mat1d CameraModel::getInversePerspectiveMapping(double pixelPerMeter, cv::Point const & origin) const {
    double f = pixelPerMeter * cameraPosition()[2];
    cv::Mat1d R(3,3);
    R <<  0, 1, 0,
          1, 0, 0,
          0, 0, 1;

    cv::Mat1d K(3,3);
    K << f, 0, origin.x, 
         0, f, origin.y, 
         0, 0, 1;
    cv::Mat1d transformtoGround = K * R * mCameraToCarMatrix(cv::Range(0,3), cv::Range(0,3));
    return transformtoGround * mIntrinsicMatrix.inv();
}

The member variables/functions used inside the functions are

mCameraToCarMatrix: a 4x4 matrix holding the homogeneous rigid transformation from the camera's coordinate system to the car's coordinate system. The camera's axes are x-right, y-down, z-forward. The car's axes are x-forward, y-left, z-up. Within this function only the rotation part of mCameraToCarMatrix is used.
mIntrinsicMatrix: the 3x3 matrix holding the camera's intrinsic parameters
cameraPosition()[2]: the Z-coordinate (height) of the camera in the car's coordinate frame. It's the same as mCameraToCarMatrix(2,3).

The function parameters:

pixelPerMeter: the resolution of the bird's eye view image. A distance of 1 meter on the XY plane will translate to pixelPerMeter pixels in the bird's eye view image.
origin: the camera's position in the bird's eye view image

You can pass the transform matrix to cv::initUndistortRectifyMaps() as newCameraMatrix and then use cv::remap to create the bird's eye view image.