How can I use stepwise regression to remove a specific coefficient in logistic regression within R?

Question

When I run the logistic regression for a cars dataset:

carlogistic.fit4 <- glm(as.factor(Mpg01) ~ Weight+Year+Origin, data=carslogic, family="binomial")
summary(carlogistic.fit4)

I get the below output: Call: glm(formula = as.factor(Mpg01) ~ Weight + Year + Origin, family = "binomial", data = carslogic)

Deviance Residuals: Min 1Q Median 3Q Max
-2.29189 -0.10014 -0.00078 0.19699 2.60606

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -2.697e+01  5.226e+00  -5.161 2.45e-07 ***
Weight         -6.006e-03  7.763e-04  -7.737 1.02e-14 ***
Year            5.677e-01  8.440e-02   6.726 1.75e-11 ***
OriginGerman    1.256e+00  5.172e-01   2.428   0.0152 *  
OriginJapanese  3.250e-01  5.462e-01   0.595   0.5519    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 549.79  on 396  degrees of freedom
Residual deviance: 151.06  on 392  degrees of freedom
AIC: 161.06

However, if you notice the p-value for Japanese origin cars is greater than 0.05 and hence is insignificant. I want to remove this from the model, however, the column header is Origin as you see in the initial code. How do I exclude Japanese origin specifically from the model?

I think I have clearly specified the p-value is greater than.05 for it and the model is not optimum. That is why it needs to be excluded. I can not amend the original dataset in the csv. — user10001876
This isn't the way stepwise regression works in R; terms are tested, and removed, termwise, and Origin is a single term. If you want to do this (I wouldn't recommend it), the workaround is to build the numeric model matrix explicitly (use model.matrix()), which will convert your factor into two numeric dummy variables; use the variables in the model matrix as separate numeric predictors, then you can drop them individually if you want. If you give a reproducible example someone might provide more details ... — Ben Bolker

mickey mickey · Accepted Answer · 2018-11-16T04:04:43

OriginJapanese is significant, because it's directly related to OriginGerman which is significant. You should think of the significance in terms of the variable Origin, not in terms of its individual levels. If any of its levels has a significant effect, the variable could be considered significant.

If you wanted to remove the OriginJapanese effect, you would either have to remove Origin altogether or relabel the Japanese cars to another group (that would be mixed in with other non-German cars).

How can I use stepwise regression to remove a specific coefficient in logistic regression within R?

3 Answers