2
votes

I have a training dataset with 60,000 images and a testing dataset with 10,000 images. Each image represents an integer number from 0 to 9. My goal was to use libsvm which is a library for Support Vector Machines in order to learn the numbers from the training dataset and use the classification produced to predict the images of the testing dataset.

Each image is 28x28 which means that it has 784 pixels or features. While the features seem to be too many it took only 5-10 minutes to run the SVM application and learn the training dataset. The testing results were very good giving me 93% success rate.

I decided to try and use PCA from matlab in order to reduce the amount of features while at the same time not losing too much information.

[coeff scores latent] = princomp(train_images,'econ');

I played with the latent a little bit and found out that the first 90 features would have as a result 10% information loss so I decided to use only the first 90.

in the above code train_images is an array of size [60000x784]

from this code I get the scores and from the scores I simply took the number of features I wanted, so finally I had for the training images an array of [60000x90]

Question 1: What's the correct way to project the testing dataset to the coefficients => coeff?

I tried using the following:

test_images = test_images' * coeff;

Note that the test_images accordingly is an array of size [784x10000] while the coeff an array of size [784x784]

Then from that again I took only the 90 features by doing the following:

test_images = test_images(:,(1:number_of_features))';

which seemed to be correct. However after running the training and then the prediction, I got a 60% success rate which is way lower than the success rate I got when I didn't use any PCA at all.

Question 2: Why did I get such low results?

After PCA I scaled the data as always which is the correct thing to do I guess. Not scaling is generally not a good idea according to the libsvm website so I don't think that's an issue here.

Thank you in advance

1
I don't see anything particularly wrong with what you did.. PCA might not be the best method in this case, since the resulting features might not useful for classification. Is this data from the Kaggle digit recognition competition? PCA was the first thing I tried as well, and I also obtained poor results. When you look at the first few eigenvectors which represent each digit, do they look super blurry (this is what I saw)? I had more luck training an SVM directly on the vector of 784 pixels, without even extracting any features.MarkV
The only way PCA works fine for me is when I don't project the testing dataset to the coefficients of the training dataset(after running PCA on it) but when I create a new array which contains both the training AND the testing dataset and thus the coefficients are calculated overall and in the end no projections are being made. However although I get a very good result, around 95% success rate, I don't think that's a good thing to do in the sense that the testing dataset is meant to be unknown.. however I didn't expect such low success rate when doing the projection...ksm001
Yes indeed they look blurryksm001

1 Answers

1
votes

Regarding your first question, I believe MarkV has already provided you with an answer. As for the second question: PCA indeed conserves most of the variance of your data, but it does not necessarily means that it maintains 90% of the information of your data. Sometimes, the information required for successful classification is actually located at these 10% you knocked off. A good example for this can be found here, especially figure 1 there.

So, if you have nice results with the full features, why reduce the dimension?

You might want to try and play with different principal components. What happens if you take components 91:180 ? that might be an interesting experiment...