I am trying to use SVM for a text classification problem. I have found an SVM implementation called SVM light and its derivative SVM multiclass (for classification problems with more than 2 classes). However I am really not able to understand the format of the file for training and testing the classifier. I understand that I need to create a feature vector (let us assume that I take each word in the document as a feature) and then for each document I have to specify its class, the features it contains (actually the index of the feature in the feature vector) and a feature value to create the train file. I am confused about this 'feature value'. What could it possibly be? Is it the count of that feature in this document? Or is it something else? The example train file that the website contains do not have integers as feature values which indicates that it is not the frequency which would form the feature value
Also I was wondering if there is some tool/software to create this train file from a simple document. I generally work with Java; so some package in Java to do this would also be good enough for me. I tried searching the Google but could not find anything relevant.
I would also like to know if there is some other better way to use SVM for text classification.
Any help in this regard would be greatly appreciated.