I'm working on a project that involves implementing some algorithms as python classes and testing their performance. I decided to write them up as sklearn estimators so that I could use GridSearchCV
for validation.
However, one of my algorithms for Inductive Matrix Completion takes more than just X
and y
as arguments. This becomes a problem for the GridSearchCV.fit
as there appears to be no way to pass more than just X
and y
to the fit method of the estimator. The source shows the following arguments for GridSearchCV.fit
:
def fit(self, X, y=None, groups=None, **fit_params):
And of course the downstream methods expect only these two arguments. Obviously it would be no trivial task (or advisable) to modify my local copy of GridSearchCV
to accommodate my needs.
For reference IMC basically states that $ R \approx XW^THY^T $. So my fit method takes the following form:
def fit(self, R, X, Y):
So trying the following fails as the Y value never gets passed to the IMC.fit
method:
imc = IMC()
params = {...}
gs = GridSearchCV(imc, param_grid=params)
gs.fit(R, X, Y)
I've created a workaround for this by modifying the IMC.fit
method like so (this also has to be inserted into the score
method):
def fit(self, R, X, Y=None):
if Y is None:
split = np.where(np.all(X == 999, axis=0))[0][0]
Y = X[:, split + 1:]
X = X[:, :split]
...
This allows me to use numpy.hstack
to stack X and Y horizontally and insert a column of all 999
between them. This array can then be passed to GridSearchCV.fit
as follows:
data = np.hstack([X, np.ones((X.shape[0],1)) * 999, Y])
gs.fit(R, data)
This approach works, but feels pretty hacky. Therefore my question is this:
**fit_params
but ended up running into an issue with the scorer. Apparently thefit_params
do not get cascaded down to the scoring method and since I am using the defaultscore
method in my class the additional matrices do not get passed in. See sklearn.model_selection._validation._score – Grr