0
votes

I would like to assess the performance of each predictor in a logistic regression model (mymodel1). These are the significance scores of the predictors and all the values are < .05. It means all predictors are significant (equally important??). How do I get a measure of importance/ information gained from each?

z <- summary(mymodel1)$coefficients/summary(mymodel1)$standard.errors
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
(Intercept)        alpha         beta gamma theta
2 0.000000e+00 0.000000e+00 0.000000e+00     0     0
3 0.000000e+00 0.000000e+00 0.000000e+00     0     0
4 2.644718e-05 4.905187e-11 7.112932e-06     0     0
5 0.000000e+00 0.000000e+00 0.000000e+00     0     0
6 0.000000e+00 0.000000e+00 0.000000e+00     0     0
2

2 Answers

1
votes

There is a lot of debate around this topic; it is really hard to vote on one method over the other. Nevertheless, I list some of the methods that are being used to assess the contribution from individual predictors.

  1. Standardize the regression coefficients

Higher the absolute value higher the contribution. I have seen the following form as well

= Abs.Value of standardized Co.Eff/ Sum (Abs. Value of all Standardized Co.Eff)

  1. Chi Square Statistic

Higher the chi square value, higher the contribution. However, chi square value would not tell anything about the magnitude.

  1. Log-Likelihood Value

You run the regression with a single predictor and compare the log-likelihood value (-2LL) with the full model log-likelihood.

Note: These are all approximations and I have not come across a rigorous method to calculate the contribution from predictors

0
votes

In order to determine predictor performance (otherwise known as feature importance) you can consider shuffling the values of each of your predictor variables across the samples (essentially creating a random variable)...

  1. Shuffle or randomize one predictor variable across the samples
  2. Create model(s) and score with appropriate metric...best if you can create multiple models using different cross fold iterations to build a distribution of scores. Record the scores.
  3. Repeat procedure (Steps 1 & 2) for every predictor variable.
  4. View and/or measure with a statistic the variable which results in the largest decrease in model performance.

Essentially you have just determined the variable which contributed the most information to the model by "removing" it.