1
votes

I'm using libsvm(3.11) tool for implementation of SVM classification in my project(Text Classification using Multi Agent). But every time when I'm predicting the result it is giving the same label to all the test Documents i.e., either +1 or -1, though I'm using different kinds of data.

I'm using the following procedure for executing libsvm classification for a plain text documents:

-> There will be a set of training text documents

-> I'm converting these text documents into libsvm supported format using TF-IDF weights(I'm taking two folders, that represents two classes .. for 1st folder I assigned label -1 and for 2nd folder it is +1 follows TF-IDF values for that text document)

-> After that I took those bag of words into one plain text document .. and then by using those words I'm generating test document vector with some label(I'm taking only one test document, so IDF will be 1 always and there ll be only one vector ... I hope label doesn't matter) ...

-> After that I'm applying the libsvm functions svm_train and svm_predict with default options

Am I doing in correct procedure?? .. If there is any wrong procedure plz feel free to inform me .. It ll really helps me ..

and Y this libsvm is always giving the result as only one label?? .. Is it any fault with my procedure?? .. or problem with tool?? Thanks in Advance..

1
The fault in the procedure is that you are just using the default parameters for svm-train. You need to do validation to choose a good set of parameters: this mains you need to train using all different parameters and run prediction on a separate validation dataset (or use k-fold cross validation) and finally see if you model generalizes by predict on a test data set.Bull
Some reasons the SVM will classify everything as one class: (1) poor choice of parameters, (2) the data are not separable, or (3) your training set is unbalanced (e.g. of you have 97% widgets and 3% wotzits the classifier can achieve 97% accuracy just by classifying everything as a widget).Bull

1 Answers

1
votes

Why are you using a new criteria to make test documents? The testing and training document sets should all be derived from your original set of "training text documents". I put these in quotes because you could take a subset of these and use them for testing. Ultimately, make sure your training and testing text document sets are distinct and from the original set.