Python: How to produce reproducible results in stacked model

Question

After so much trial and errors I have finally managed to build my own stacked model. but I am unable to produce(accuracy) the same every time. I know I have to initialize random_state parameter to any value but even after explicitly writing the random_state value to some value before calling the class method I still get random results.

class Stacking(BaseEstimator, ClassifierMixin):
    def __init__(self, BaseModels, MetaModel, nfolds = 3, seed = 1):
        self.BaseModels = BaseModels
        self.MetaModel = MetaModel
        self.nfolds = nfolds
        self.seed = np.random.seed(seed) <---- This fixed my error. thanks to foladev.

    def fit(self, X, y):
        self.BaseModels_ = [list() for model in self.BaseModels]
        self.MetaModel_ = clone(self.MetaModel)
        kf = KFold(n_splits = self.nfolds, shuffle = False, random_state = 6)
        out_of_fold_preds = np.zeros((X.shape[0], len(self.BaseModels_)))

        for index, model in enumerate(self.BaseModels_):
            for train_index, out_of_fold_index in kf.split(X, y):
                instance = clone(model)
                self.BaseModels_[index].append(instance)
                instance.fit(X[train_index], y[train_index])

                preds = instance.predict(X[out_of_fold_index])
                out_of_fold_preds[out_of_fold_index, index] = preds
                #print(model, preds, out_of_fold_preds.shape)
        self.MetaModel_.fit(out_of_fold_preds, y)
        return self

I am using LogisticRegression, SGDClassifier, RandomForestClassifer as my base models and XGBoost as my meta model. random_state is present in all the models but works only on the base models.

I get error "init() got an unexpected keyword argument 'random_state'" when random_state is put in xgbclassifier.

Please note, I have tried initializing random_state before calling the class. Tried altering shuffle in KFold. Also, how can I initialize parameters inside the class method?

foladev foladev · Accepted Answer · 2018-02-08T18:07:41

From the API, it looks like xgbclassifier uses seed.

xgboost.XGBClassifier(
    max_depth=3, 
    learning_rate=0.1, 
    n_estimators=100, 
    silent=True, 
    objective='binary:logistic', 
    booster='gbtree', 
    n_jobs=1, 
    nthread=None, 
    gamma=0, 
    min_child_weight=1, 
    max_delta_step=0, 
    subsample=1, 
    colsample_bytree=1, 
    colsample_bylevel=1, 
    reg_alpha=0, 
    reg_lambda=1, 
    scale_pos_weight=1, 
    base_score=0.5, 
    random_state=0, 
    seed=None, 
    missing=None, 
    **kwargs
)

May I ask why you do not set a class level seed and apply that to all methods?

Python: How to produce reproducible results in stacked model

1 Answers