I have an estimator like this:
from sklearn.base import BaseEstimator, TransformerMixin
import numpy as np
class customEstimator(BaseEstimator, TransformerMixin):
def __init__(self, estimator_var):
self.estimator_var = estimator_var
def transform(self, X):
self.tmpVar = np.random.randn(estimator_var, estimator_var)
return np.hstack((self.tmpVar, X)) # this is just an example
def fit(self, X, y=None):
return self
def get_params(self, deep=False):
return {'estimator_var': self.estimator_var, 'tmpVar': tmpVar}
I then create a pipeline with the estimator in it (and others) and feed it into GridSearchCV for k-fold cross validation. k-fold cross validation goes something like this:
for all possible params combination:
for every fold split
compute score(mini_train, mini_test)
compute average score
pick best combination
The issue is that, for a given combination of parameters, I would like to compute self.tmpVar (which may be slow to compute) only once, and use it for all fold splittings that share the same combination of parameters.
Would that be possible in scikit-learn, or is there a workaround?