4
votes

I am working on fusing Lidar and Camera images in order to perform a classification object algorithm using CNN.

I want to use the KITTI Dataset which provide synchronized lidar and rgb image data. Lidar are 3D-scanners, so the output is a 3D Point Cloud.

I want to use depth information from point cloud as a channel for the CNN. But I have never work with point cloud so I am asking for some help. Is projecting the point cloud into the camera image plane (using projection matrix provide by Kitti) will give me the depth map that I want? Is Python libray pcl useful or I should move to c++ libraries?

If you have any suggestions, thanks you in advance

2

2 Answers

2
votes

I'm not sure what projection matrix provide by Kitti includes, so the answer is it depends. If this projection matrix only contains a transformation matrix, you cannot generate depth map from it. The 2D image has distortion that comes from the 2D camera and the point cloud usually doesn't have distortion, so you cannot "precisely" map point cloud to rgb image without intrinsic and extrinsic parameters.

PCL is not required to do this.

Depth map essentially is mapping depth value to rgb image. You can treat each point in point cloud(each laser of lider) as a pixel of the rgb image. Therefore, I think all you need to do is finding which point in point cloud corresponding to the first pixel(top left corner) of the rgb image. Then read the depth value from point cloud based on rgb image resolution.

0
votes

You have nothing to do with camera. This is all about point cloud data. Lets say you have 10 million of points and each point has x,y,z in meters. If the data is not in meters first convert it. Then you need the position of the lidar. When you subtract position of car from all the points one by one, you will take the position of lidar to the (0,0,0) point, then you can just print the point on a white image. The rest is simple math, there may be many ways to do it. First that comes to my mind: think rgb as binary numbers. Lets say 1cm is scaled to change in 1 blue, 256cm change equals to change in 1 green and 256x256 which is 65536 cm change equals change in 1 red. We know that cam is (0,0,0) if rgb of the point is 1,0,0 then that means 256x256x1+0x256+0x1=65536 cm away from the camera. This could be done in C++. Also you can use interpolation and closest point algorithms to fill blanks if there are