8
votes

I want to recognize Vehicles(Cars, Bikes etc.) from a static image. I was thinking of using SURF to get me useful keypoints and descriptors and then train a MLP(Multi Layer Perceptron) Neural Network. However I don't know what will be the input to the Neural Network and what it's output will be so that I can identify which portion of the image a vehicle is located(Probably a rectangle drawn around it). I know that SURF can return useful keypoints in the image along with their descriptors(I have done this). The keypoints have angles and each keypoint corresponds to a 64 or 128 long Vector as the Descriptor. What I don't know is what exactly these keypoints are and how they could be used as an input to the Neural Network.

I am using OpenCV with Python.

I am new to using SURF and other Feature Extraction methods. Any help pertaining to this will be very good.

3

3 Answers

8
votes

If you use surf features, thats mean a float set off vector [128] or [64] depending of you surf configuration you will be set the neural net as follow

-Create a data base with models :

-bikes
-cars
-autobus
-truck

-Take diferents photos of each type of objects like 10 photos of diferents model off cars, 10 photos of diferents model off bikes 10 photos of differents model off truck... etc, to each photo off each object class extract its surf feature vectors.

-Each type of object will be represent one off class of object in the neural-net like this;

-car   ;object class 1 =binary representaation in 4 bits= 0 0 0 1
-bikes ;obejct class 2 =binary representaation in 4 bits= 0 0 1 0 
-truck ;obejct class 3 =binary representaation in 4 bits= 0 0 1 1
-ball  ;obejct class 4 =binary representaation in 4 bits= 0 1 0 0

-Each bit in binary repesentacion will be correspond to one neuron in the output layer of the network and represent one class of object to be recognized

Now the configuration of neural network will be based on the size of the feature vector and the number of types of object that you wanna recognize in this way;

The Number of nuerons in the input-layer;64 or 128 depending of the size off surf feature vector that you configured and used

The number of nuerons in the output-layer in the neural-net will be the number of classes of objects that you wanna recognize in this example 4

The activation function neecesary to each neuron is the sigmoid or tanh function (http://www.learnartificialneuralnetworks.com/), beacause the surf features are represented by floats numbers, if you use freak fetaures or another binary local feature descriptor (Brisk, ORB, BRief ) then you will be use an binary activation function to each neuron like step function o sigm function

The algoritm used to train the network is the backpropagation

before continue you need set and prepare the data set to train the neural network

example

-all feature vector extracted from picture belong a car will be label or asociated to class 1               
-all feature vector extracted from picture belong a bike will be label or asociated to class 2
-all feature vector extracted from picture belong a truk will be label or asociated to class 3
-all feature vector extracted from picture belong a ball will be label or asociated to class 4

to this example you will have 4 neurons in out-put layer and 128 0r 64 neurons of in input-layer.

-The output of neural net in recognittion mode will be the neuron that have the most hight value of this 4 nuerons.

its necesarry use normalization in the interval [0,1] to all features in the data set, before begin the training phase,because the out-put of the neural net is the probability that have the input vector to belong at one class of object in the data set.

the data set to train the network have to be split as follow:

-70% off the data used to train
-15% off the data used to validate the network arquitecture (number of neurons in the hidden layyer)
-15% off the data used to test the final network

when training the neural network, the stop criterion is recognittion rate,when its is near to 85-90%

why use neural net and not svm machines, svm machines work fine ,but it not can be make a the best separation class map in no linear classification problems like this or when you have lot of diferents objects classes or types of objects, this lack is aprecciate in the recognittion phase results

I recomended you read some about the neural network theory to understand how they work

http://link.springer.com/chapter/10.1007%2F11578079_10

opencv have machine learning class to neural nets mlp module

hope this can help you

2
votes

My suggestion is to observe BOW instead of neural network. See here an example of using SURF with Bag Of Words model for object classification (first part,second part). To improve classification performance you could to try to replace Naive Bayes Classifier with SVM. Also, author provided good source code example. I think it's a good point to start.

2
votes

An easy way to separte the object detected is runing an contour detector in the input image like that.

After its you can use the x,y coords from each key points associate to each feature vector recognized by the neural network,and check how many of these key points are in side each contour of each objects,

At the same time its let you put an treshold to validate a correct car detection, for example if you have 2 taxis;2 contour belong to each car and affter check how many key ponits are in each contour;

  • contour belong taxi 1 have 20 key ponits inside
  • contour belong taxi 1 have 5 key ponits inside

you can give as taxi 1 o car 1 as object recognized

On time you have the a validate object recognized and its contour,you can calcuialte the bouding box that closed the object

Another way to do it is to each training image extract its contours belong to each object ,calculate its bounding box use this like pure image to extract the features and make it to al picures en trainin set