I'm trying to Classify a Text Document into Categories , for example :
Document 1 : " Basketball is a good sport " ---> Category : Sport
Document 2 : " World war 2 .. " ---> Category : History
...
My gool is to create a Java interface with a SVM Algorithm !
So, I should use SVM Java Library , I found two :
- SVMLIGH
- LIBSVM
Should I use the first one or the second?
I had do many research , and I found that I should do two things :
I should prepare a training file.
In SVM there is a special format for this file ( Example : 1 1:317.5 )
But the question is : From what I Should Generate this file ? From the documents only ? Or From something else ?I should have a test file, that's mean a new document to classify. Should I transform the new document to classify into SVM Test file format?
That's correct?
Please guide me I'm truly lost and I don't know what I should do ! PLZ