
I want to use svm light for classification.

In example that was on its site, file format was:

<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>

I can't understand this format. What does line and value refer to? here is a part of example training set:

1 6:0.0198403253586671 15:0.0339873732306071 29:0.0360280968798065 31:0.0378103484117687 41:0.0456787263779904 63:0.021442413608662 74:0.0813238108919922 75:0.0201048944012214 81:0.0603996615380116 142:0.0102897706466067 172:0.0777948548082322 174:0.072717200608936 179:0.054076743737027 180:0.0764456665578607 186:0.112586705083256 187:0.0925011727805475 205:0.156990770998115 229:0.0519505660963924 255:0.0923321093879611 308:0.0732972342194965 318:0.119740882706743 408:0.058414185072804 409:0.0506626198519805 465:0.0843545829662396 480:0.0729642744872502 519:0.118611296605205 664:0.112142083701314 679:0.374387819227881 720:0.0987664035972632 768:0.123975200617516 922:0.141018083523918 977:0.136393581474495 1018:0.107648758381437 1305:0.180449632267364 1581:0.141526866911118 1677:0.156124608446181 1817:0.141018083523918 2162:0.170921341813635 2314:0.164249324532582 2358:0.508349039100422 2419:0.150582824316425 3266:0.338899359400281 3374:0.166725496161846 8311:0.219691455487068

I know that in first line of data 1 refer to positive output, 6 refer to target. target refer to a word. and 0.0198403253586671 refer to value. but i do not know how this value:0.0198403253586671 is calculated.


2 Answers


Line 1: a line is a target followed by a list of feature value pairs and some comment Line 2: a target is one or minus one or zero or a float Line 3: a feature is an integer and so on. It is the grammar of all possible inputs.


I'm using SVMLIght for sentiment analysis. The value is calculated from the frequency of the term in the document.