Python Select variables in multiple linear regression

Question

I have a dependent variable y and 6 independent variables. I want to make a linear regression out of it. I use sklearn library to do it.

The problem is some of my independent variables have correlation more than 0.5. So I can't have them in my model at the same time

I searched throw internet but didn't find any solution to select best set of independent variables to draw linear regression and output the variables that had been selected.

One possibility is to first try a fit with all variables, and then remove from the regression the variable with the least significance and then re-run to see what happens to the fitting results. This test is easy to perform and might help in your analytical work. — James Phillips

ritchie46 ritchie46 · Accepted Answer · 2019-03-07T08:27:24

If you see that you have a correlation between independent variables. You should consider to remove them.

I see you are working with scikit-learn. If you don't want to do any feature selection manually, you could always use one of the feature selection methods in scikit-learns feature_selection module. There are many ways to automatically remove features, and you should cross-validate to determine which one is best for your problem.

Python Select variables in multiple linear regression

2 Answers