Scikit-Learn Linear Regression how to get coefficient's respective features?

Question

I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get the respective features, as only coefficients are returned form the coef._ attribute. The documentation says:

Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

I am passing into my regression.fit(A,B), where A is a 2-D array, with tfidf value for each feature in a document. Example format:

         "feature1"   "feature2"
"Doc1"    .44          .22
"Doc2"    .11          .6
"Doc3"    .22          .2

B are my target values for the data, which are just numbers 1-100 associated with each document:

"Doc1"    50
"Doc2"    11
"Doc3"    99

Using regression.coef_, I get a list of coefficients, but not their corresponding features! How can I get the features? I'm guessing I need to modfy the structure of my B targets, but I don't know how.

Kirsche Kirsche · Accepted Answer · 2017-04-29T19:41:31

What I found to work was:

X = your independent variables

coefficients = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(logistic.coef_))], axis = 1)

The assumption you stated: that the order of regression.coef_ is the same as in the TRAIN set holds true in my experiences. (works with the underlying data and also checks out with correlations between X and y)

Scikit-Learn Linear Regression how to get coefficient's respective features?

7 Answers

Coefficients and features in zip

Coefficients and features in DataFrame