I am running ridge regression on a dataset. I have done 5 fold cross validation. So basically my dataset is divided into 5 train and 5 test folds.
This is how I did in scikit:
from sklearn import cross_validation
k_fold=cross_validation.KFold(n=len(tourism_train_X),n_folds=5)
I set the regularisation parameter like this:
#Generating alpha values for regularization parameters
n_alphas = 200
alphas = np.logspace(-10, -1, n_alphas)
Now , my doubt is, for each train and test fold I do something like this.
ridge_tourism = linear_model.Ridge()
for a in alphas:
ridge_tourism.set_params(alpha=a)
index=0
for train_indices, test_indices in k_fold:
ridge_tourism.fit(tourism_train_X[train_indices], tourism_train_Y[train_indices]) # Fitting the model
coefs.append(ridge_tourism.coef_)
The problem is it would give me coefficient vector for each of the five training fold within each alpha. All I want is for each alpha what is the best coefficient vector chosen. How do we get that? How do we choose out of 5 train sets which coefficient vector is finally reported for that alpha?