1
votes

Currently I am doing English alphabet classification using SVM classifier in opencv. I have following doubts in doing above thing

  1. How length of feature vector depends on the classification ? (What will happen if feature length increases (my current feature length is 125))

  2. Is time taken for prediction depend on number of data used for training ?

  3. Why we need normalization of feature vector (will this improve accuracy of prediction and time required for the prediction of the class) ?

  4. How to determine best method for normalizing feature vector ?

2

2 Answers

2
votes

1) Length of features does not matter per se, what matters is predictive quality of features

2) No, it does not depend on number of samples, but it depends on number of features (prediction is generally very fast)

3) Normalization is required if features are in very different ranges of values

4) There are basically standarization (mean, stdev) and scaling (xmax -> +1, xmean -> -1 or 0) - you could do both and see which one is better

2
votes

when talking about classification the data consists of feature vectors with a number of features. in image processing there is also features which are mapped to classification feature vectors. so your "feature length" is actually the number of features or feature vector size.

1) the number of features matter. in principle more features allow better classification but also lead to overtraining. to avoid the latter you can add more samples (more feature vectors).

2) yes, as the prediction time depends on the number of support vectors and the size of the support vectors. but as prediction is very fast this is not an issue unless you have some real time requirements.

3) while SVM as a maximum margin classifier is quite robust against different feature value ranges a feature with a bigger value range would have more weight than one with a smaller range. this especially applies to penalty calculation if classes are not completely separable.

4) as SVM is quite robust against different value ranges (compared to cluster oriented algorithms) this is not the biggest issue. typically absolute min/max are scaled to -1/+1. if you know the expected range of your data you could scale that range and measurement errors in your data would not influence the scaling. a fixed range is also preferable when adding trraining data in an iterative process.