3
votes

From following links I came with some idea. I want to ask whether I am doing it right or I am in the wrong way. If I am in the wrong way, please guide me.

Links
Using libsvm for text classification c#
How to use libsvm for text classification?

My way

First calculate the word count in each training set
Create a maping list for each word

eg

sample word count form training set
|-----|-----------|
|     |   counts  |
|-----|-----|-----|
|text | +ve | -ve |
|-----|-----|-----|
|this | 3   | 3   |
|forum| 1   | 0   |
|is   | 10  | 12  |
|good | 10  | 5   |
|-----|-----|-----|

positive training data

this forum is good

so will the training set be

+1 1:3 2:1 3:10 4:10

this all is just what I received from above links.
Please help me.

2

2 Answers

4
votes

You're doing it right.

I don't know why your laben is called "+1" - should be a simple integer (refering to the document "+ve"), but all in all it's the way to go.

For document classification you may want to take a look at liblinear which is specially designed for handling a lot of features.

0
votes

you can also use libshorttext from here: libshortText

in python