2
votes

I have regression task and I am predicting here with linear regression and random-forest models. Need some hints or code example how to ensemble them (averaging already done). Here are my model realizations with python:

np.random.seed(42)
mask = np.random.rand(happiness2.shape[0]) <= 0.7

print('Train set shape {0}, test set shape {1}'.format(happiness2[mask].shape, happiness2[~mask].shape))

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(happiness22[mask].drop(['Country', 'Happiness_Score_2017',
                               'Happiness_Score_2018','Happiness_Score_2019'], axis=1).fillna(0), 
       happiness22[mask]['Happiness_Score_2019'] )

pred = lr.predict(happiness22[~mask].drop(['Country', 'Happiness_Score_2017',
                               'Happiness_Score_2018','Happiness_Score_2019'], axis=1).fillna(0)) 
print('RMSE = {0:.04f}'.format(np.sqrt(np.mean((pred - happiness22[~mask]['Happiness_Score_2019'])**2)))) 

from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(n_estimators=100)
rf.fit(happiness22[mask].drop(['Country', 'Happiness_Score_2017',
                               'Happiness_Score_2018','Happiness_Score_2019'], axis=1).fillna(0), 
       happiness22[mask]['Happiness_Score_2019'] )
pred3 = rf.predict(happiness22[~mask].drop(['Country', 'Happiness_Score_2017',
                               'Happiness_Score_2018','Happiness_Score_2019'], axis=1).fillna(0))
print('RMSE = {0:.04f}'.format(np.sqrt(np.mean((pred3 - happiness22[~mask]['Happiness_Score_2019'])**2))))

avepred=(pred+pred3)/2
print('RMSE = {0:.04f}'.format(np.sqrt(np.mean((avepred - happiness22[~mask]['Happiness_Score_2019'])**2))))
1
@ShihabShahriarKhan is it suitable for regression tasks? Because I know this one, but in my opinion, it's for classification tasks.Adolf Miszka

1 Answers

1
votes

First, you can evaluate each model (linear regression and random forest) on a validation set and get out the error (MSE for instance). Then, weight each model according to this error and use this weight later when predicting.

You can use also cobra ensemble method (developped by Guedj et al.) https://modal.lille.inria.fr/pycobra/