1
votes

When I run the logistic regression for a cars dataset:

carlogistic.fit4 <- glm(as.factor(Mpg01) ~ Weight+Year+Origin, data=carslogic, family="binomial")
summary(carlogistic.fit4)

I get the below output: Call: glm(formula = as.factor(Mpg01) ~ Weight + Year + Origin, family = "binomial", data = carslogic)

Deviance Residuals: Min 1Q Median 3Q Max
-2.29189 -0.10014 -0.00078 0.19699 2.60606

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -2.697e+01  5.226e+00  -5.161 2.45e-07 ***
Weight         -6.006e-03  7.763e-04  -7.737 1.02e-14 ***
Year            5.677e-01  8.440e-02   6.726 1.75e-11 ***
OriginGerman    1.256e+00  5.172e-01   2.428   0.0152 *  
OriginJapanese  3.250e-01  5.462e-01   0.595   0.5519    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 549.79  on 396  degrees of freedom
Residual deviance: 151.06  on 392  degrees of freedom
AIC: 161.06

However, if you notice the p-value for Japanese origin cars is greater than 0.05 and hence is insignificant. I want to remove this from the model, however, the column header is Origin as you see in the initial code. How do I exclude Japanese origin specifically from the model?

3
I think I have clearly specified the p-value is greater than.05 for it and the model is not optimum. That is why it needs to be excluded. I can not amend the original dataset in the csv.user10001876
This isn't the way stepwise regression works in R; terms are tested, and removed, termwise, and Origin is a single term. If you want to do this (I wouldn't recommend it), the workaround is to build the numeric model matrix explicitly (use model.matrix()), which will convert your factor into two numeric dummy variables; use the variables in the model matrix as separate numeric predictors, then you can drop them individually if you want. If you give a reproducible example someone might provide more details ...Ben Bolker

3 Answers

3
votes

OriginJapanese is significant, because it's directly related to OriginGerman which is significant. You should think of the significance in terms of the variable Origin, not in terms of its individual levels. If any of its levels has a significant effect, the variable could be considered significant.

If you wanted to remove the OriginJapanese effect, you would either have to remove Origin altogether or relabel the Japanese cars to another group (that would be mixed in with other non-German cars).

1
votes

Looking at the logistic regression function, I would assume that Origin is a dummy variable? If so, just by removing OriginJapanese would not work in this case. You would need to remove "Origin" all together and re-run the model and compare the AIC and significance of Weight and Year in the new model.

Just an example, if we have a dummy variable for Gender (male, female) and female dummy variable appears to be insignificant, then removing female dummy variable means that you are changing the sampling and looking only at male population.

0
votes

One possibility is to try looking into stepwise-selection with caret. Another possible approach is through crossvalidation, i.e., the LAR/LASSO approaches.