1
votes

I want to use svm light for classification.

In example that was on its site, file format was:

<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>

I can't understand this format. What does line and value refer to? here is a part of example training set:

1 6:0.0198403253586671 15:0.0339873732306071 29:0.0360280968798065 31:0.0378103484117687 41:0.0456787263779904 63:0.021442413608662 74:0.0813238108919922 75:0.0201048944012214 81:0.0603996615380116 142:0.0102897706466067 172:0.0777948548082322 174:0.072717200608936 179:0.054076743737027 180:0.0764456665578607 186:0.112586705083256 187:0.0925011727805475 205:0.156990770998115 229:0.0519505660963924 255:0.0923321093879611 308:0.0732972342194965 318:0.119740882706743 408:0.058414185072804 409:0.0506626198519805 465:0.0843545829662396 480:0.0729642744872502 519:0.118611296605205 664:0.112142083701314 679:0.374387819227881 720:0.0987664035972632 768:0.123975200617516 922:0.141018083523918 977:0.136393581474495 1018:0.107648758381437 1305:0.180449632267364 1581:0.141526866911118 1677:0.156124608446181 1817:0.141018083523918 2162:0.170921341813635 2314:0.164249324532582 2358:0.508349039100422 2419:0.150582824316425 3266:0.338899359400281 3374:0.166725496161846 8311:0.219691455487068

I know that in first line of data 1 refer to positive output, 6 refer to target. target refer to a word. and 0.0198403253586671 refer to value. but i do not know how this value:0.0198403253586671 is calculated.

2

2 Answers

0
votes

Line 1: a line is a target followed by a list of feature value pairs and some comment Line 2: a target is one or minus one or zero or a float Line 3: a feature is an integer and so on. It is the grammar of all possible inputs.

0
votes

I'm using SVMLIght for sentiment analysis. The value is calculated from the frequency of the term in the document.