Can one train estimators in a scikit-learn pipeline simultaneously?

Question

Is it possible to do the following in scikit-learn? We train an estimator A using the given mapping from features to targets, then we use the same data (or mapping) to train another estimator B, then we use outputs of the two trained estimators (A and B) as inputs for an estimator C and as the target we use the same target as before.

In other words, we train two estimators (predictors) and then we try to combine the "weak" predictions coming from the estimator A and B to get a better ("stronger") prediction. To find the best way to combine the two types of predictions (from estimator A and B) we use another training (in fact we learn in an automatic way how to combine the given predictions).

So, we have the following structure:

A -> C
B -> C

Now, I want to have the same "tree" of estimators. However, I want to train them simultaneously. By that I mean that expertises of the estimator A and B should not be evaluated by their ability to predict the target. I want instead to evaluate the expertises of these two estimators by their ability to improve the predictions coming from the estimator C.

Artem Sobolev Artem Sobolev · Accepted Answer · 2015-05-21T23:09:52

You can write your own transformer that'll transform input into predictions. Something like this:

class PredictionTransformer(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin):
    def __init__(self, estimator):
        self.estimator = estimator

    def fit(self, X, y):
        self.estimator.fit(X, y)
        return self

    def transform(self, X): 
        return self.estimator.predict_proba(X)

Then you can use FeatureUnion to glue your transformers together.

That said, there's a caveat: this technique is known as Stacking and is prone to overfitting when all classifiers are trained using the same data, so you might want to write something more sophisticated that will partition train set into 2 parts: one to fit base predictors, and one to fit meta learner (the one that combines base predictions).

Can one train estimators in a scikit-learn pipeline simultaneously?

1 Answers