0
votes

I am performing feature selection ( on a dataset with 1,00,000 rows and 32 features) using multinomial Logistic Regression using python.Now, what would be the most efficient way to select features in order to build model for multiclass target variable(1,2,3,4,5,6,7)?

1
Feature selection is a huge topic. I'm voting to close this question as too broad.juanpa.arrivillaga
this seems more like a statistical question and should be at <stats.stackexchange.com> . But to give you something to google : you could do a Lasso to select features. But yeah that is a huge topicMarvin Taschenberger
Check boruta feature selection on the web link I have tried it and it works well. It mimics sklearn interface so you can use it to select and then apply the classifier/regressor. However, there are so many methods for feature selection / extractionseralouk

1 Answers

2
votes

Of course there are several methods to choose your features. But sometimes the next simple approach can help you. You can assess the contribution of your features (by potential prediction of the result variable) with help of linear models. Note that it mainly works for the situations where you suspect linear dependence between your features and the answer.

import statsmodels.formula.api as smf

# Lottery here is Y, the fields from X are right of ~
mod = smf.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df) 
res = mod.fit()
print(res.summary())

OLS Regression Results                            
==============================================================================
Dep. Variable:                Lottery   R-squared:                       0.338
Model:                            OLS   Adj. R-squared:                  0.287
Method:                 Least Squares   F-statistic:                     6.636
Date:                Tue, 28 Feb 2017   Prob (F-statistic):           1.07e-05
Time:                        21:36:08   Log-Likelihood:                -375.30
No. Observations:                  85   AIC:                             764.6
Df Residuals:                      78   BIC:                             781.7
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept      38.6517      9.456      4.087      0.000      19.826      57.478
Region[T.E]   -15.4278      9.727     -1.586      0.117     -34.793       3.938
Region[T.N]   -10.0170      9.260     -1.082      0.283     -28.453       8.419
Region[T.S]    -4.5483      7.279     -0.625      0.534     -19.039       9.943
Region[T.W]   -10.0913      7.196     -1.402      0.165     -24.418       4.235
Literacy       -0.1858      0.210     -0.886      0.378      -0.603       0.232
Wealth          0.4515      0.103      4.390      0.000       0.247       0.656
==============================================================================
Omnibus:                        3.049   Durbin-Watson:                   1.785
Prob(Omnibus):                  0.218   Jarque-Bera (JB):                2.694
Skew:                          -0.340   Prob(JB):                        0.260
Kurtosis:                       2.454   Cond. No.                         371.
==============================================================================

The more R-squared value, the better your chosen combination of features can predict the response in linear model. If they can predict in linear models then, I think, they have even bigger potential with more complex models such as decision trees.

Please view the next page for more details (please note that some additional data handling may be required if errors of your data are heteroskedasticity to get a right result): http://www.statsmodels.org/dev/example_formulas.html

And of course I recommend you build pair plot for your features too.

The methods is not very deep, they referrers to correlations and what you see, but sometimes (in not difficult situations) are pragmatic.