If you inspect the file svm-scale.c you will find that the formula that scales data is:
value = y_lower + (y_upper-y_lower) * (value - y_min)/(y_max-y_min);
Where y_lower
y_upper
are y
scaling limits
So as you can see the scaled value is not worked out as you were supposing "subtracting the value from the minimum and then dividing by the range for particular feature". If you want to recover the real value you only have to undo the formula.
Example:
If you take one the many datasets that are available in the libSVM site as examples, such as this one: covtype dataset
, and you open it, you will see a file such this one:
1 1:2596 2:51 3:3 4:258 6:510 7:221 8:232 9:148 10:6279 11:1 43:1
1 1:2590 2:56 3:2 4:212 5:-6 6:390 7:220 8:235 9:151 10:6225 11:1 43:1
2 1:2804 2:139 3:9 4:268 5:65 6:3180 7:234 8:238 9:135 10:6121 11:1 26:1
2 1:2785 2:155 3:18 4:242 5:118 6:3090 7:238 8:238 9:122 10:6211 11:1 44:1
1 1:2595 2:45 3:2 4:153 5:-1 6:391 7:220 8:234 9:150 10:6172 11:1 43:1
...
Now let's scale it using:
./svm-scale -s covtype.libsvm.binary.range covtype.libsvm.binary > covtype.libsvm.binary.scale
This will generate two files, the .range
file will contain all the information related to the scale process (max and min per column), and the .scale
file which is the output, that will look like:
1 1:-0.262631 2:-0.716667 3:-0.909091 4:-0.630637 5:-0.552972 6:-0.856681 7:0.740157 8:0.826772 9:0.165354 10:0.750732 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
1 1:-0.268634 2:-0.688889 3:-0.939394 4:-0.696492 5:-0.568475 6:-0.890403 7:0.732283 8:0.850394 9:0.188976 10:0.735675 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
2 1:-0.0545273 2:-0.227778 3:-0.727273 4:-0.616321 5:-0.385013 6:-0.106365 7:0.84252 8:0.874016 9:0.0629921 10:0.706678 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:-1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
2 1:-0.0735368 2:-0.138889 3:-0.454545 4:-0.653543 5:-0.248062 6:-0.131657 7:0.874016 8:0.874016 9:-0.0393701 10:0.731772 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:-1 44:1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
1 1:-0.263632 2:-0.75 3:-0.939394 4:-0.780959 5:-0.555556 6:-0.890122 7:0.732283 8:0.84252 9:0.181102 10:0.720898 11:1 12:-1 13:-1 14:-1 15:-1 16:-1 17:-1 18:-1 19:-1 20:-1 21:-1 22:-1 23:-1 24:-1 25:-1 26:-1 27:-1 28:-1 29:-1 30:-1 31:-1 32:-1 33:-1 34:-1 35:-1 36:-1 37:-1 38:-1 39:-1 40:-1 41:-1 42:-1 43:1 44:-1 45:-1 46:-1 47:-1 48:-1 49:-1 50:-1 51:-1 52:-1 53:-1 54:-1
...
The .range
file looks like:
x
-1 1
1 1859 3858
2 0 360
3 0 66
4 0 1397
...
So taking into account that y_lower = -1
and y_upper = 1
you can verify for the first element 2596
the conversion:
value = -1 + (1 - (-1)) * (2596 - 1859) / (3858 - 1859) = -0.26263131565782893
Which is the expected value :)
Tip:
Normally you scale your training set with svm-scale
, get your model (using k-fold cross validation) and finally performing testing scaling data with the values (y_max
and y_min
) obtained from training. You can see the process in the file tools/easy.py
.