3
votes

I have a large dataset of vehicle images with the ground truth of their lengths (Over 100k samples). Is it possible to train a deep network to estimate vehicle length ?

I haven't seen any papers related to estimating object size using deep neural network.

2
have you tried looking into the sister site ai.stackexchange.com ?A.Rashad
It is really interesting, I have not seen this before, I will be surprised to see it workingBashar Haddad

2 Answers

2
votes

[Update: I didn't notice computer-vision tag in the question, so my original answer was for a different question]:

Current convolutional neural networks are pretty good at identifying vehicle model from raw pixels. The technique is called transfer learning: take a general pre-trained model, such as VGGNet or AlexNet, and fine tune it on a vehicle data set. For example, here's a report of CS 231n course project that does exactly this (note: done by students, in 2015). No wonder there are apps out there already that do it in smartphone.

So it's more or less a solved problem. Once you know the model type, it's easy to look up it's size / length.

But if you're asking a more general question, when the vehicle isn't standard (e.g. has a trailer, or somehow modified), this is much more difficult, even for a human being. A slight change in perspective can result in significant error. Not to mention that some parts of the vehicle may be simply not visible. So the answer to this question is no.

Original answer (assumes the data is a table of general vehicle features, not the picture):

I don't see any difference between vehicle size prediction and, for instance, house price prediction. The process is the same (in the simplest setting): the model learns correlations between features and targets from the training data and then is able to predict the values for unseen data.

If you have good input features and big enough training set (100k will do), you probably don't even need a deep network for this. In many cases that I've seen, a simplest linear regression produces very reasonable predictions, plus it can be trained almost instantly. So, in general, the answer is "yes", but it boils down to what particular data (features) you have.

1
votes

You may do this under some strict conditions.

A brief introduction to Computer Vision / Multi-View Geometry:

Based on the basics of the Multi-View Geometry, the main problem of identifying of the object size is finding the conversion function from camera view to real world coordinates. By applying different conditions (i.e. capturing many sequential images - video / SfM -, taking same object's picture from different angles), we can estimate this conversion function. Hence, this is completely dependent on camera parameters like focal length, pixel width / height, distortion etc. As soon as we have the camera to real world conversion function, it is super easy to calculate camera to point distance, hence the object's size.

So, based on your current task, you need to supply

  • image
  • camera's intrinsic parameters
  • (optionally) camera's extrinsic parameters

and get the output that you desire hopefully.

Alternatively, if you can fix the camera (same model, same intrinsic / extrinsic parameters), you can directly find the correlation between same camera's image and distance / object sizes just by giving the image as the only input. However, the NN will most probably will not work for different cameras.