I came across an SVM example, but I didn't understand. I would appreciate it if somebody could explain how the prediction works. Please see the explanation below:
The dataset has 10,000 observations with 5 attributes (Sepal Width
, Sepal Length
, Petal Width
, Petal Length
, Label
). The label gets positive
if it belongs to the I.setosa
class, and negative
if belongs to some other class.
There are 6000 observations for which the outcome is known (i.e. they belong to the I.setosa
class, so they get positive for the label attribute). The labels for the remaining 4000 are unknown, so the label was assumed to be negative. The 6000 observations and 2500 randomly selected observations from the remaining 4000 form the set for the 10-fold cross validation. SVM (10 fold cross validation) is then used for machine learning on the 8500 observations and the ROC is plotted.
Where are we predicting here? The set has 6000 observations for which the values are already known. How did the remaining 2500 get negative labels? When SVM is used, some observations that are positive get negative prediction. The prediction didn't make any sense to me here. Why are those 1500 observations excluded.
I hope my explanation is clear. Please let me know if I haven't explained anything clearly.