0
votes

I made an analysis on some data using Dell's Statistica software. I am using this analysis in a scientific paper. Although data mining is not my primary topic I took Data Mining class before and have some knowledge.

I know that data is either separated as %75 %25 (numbers may change) training and test parts or n fold cross validation is used to test the model performance.

In Statistica SVM modeling prior to execution of model there are tabs to make configurations. In data sampling tab I entered %75, %25 separation and in cross-validation tab I entered 10 -fold cross validation. In the output, I see that the data was actually separated as training and test (model predictions are given for test values).

There is also a cross-validation error. I will copy results below. I have difficulty in the understanding and in the interpretation of this output. I hope someone who know better statistics compared to me and/or who is more experienced to this tools may explain how it works to me?

Ferda

Support Vector machine results SVM type: Regression type 1 (capacity=9.000, epsilon=0.100) Kernel type: Radial Basis Function (gamma=0.053) Number of support vectors = 705 (674 bounded) Cross-validation error = 0.244
Mean error squared = 1.830(Train), 0.193(Test), 1.267(Overall) S.D. ratio = 0.952(Train), 37076026627971.336(Test), 0.977(Overall) Correlation coefficient = 0.314(Train), -0.000(Test), 0.272(Overall)

1

1 Answers

0
votes

I found out that Statistica website has an answer for my misunderstanding. In Sampling tab data may be separated into training and test sets and in cross- validation tab, if for example 10 is selected then 10-fold cross validation is used to decide the proper ni, epsilon etc. like SVM parameters for the execution of the SVM modeling.

This explanation cleared out my problem. I hope it helps to people in similar situations...

Ferda