When I run the logistic regression for a cars dataset:
carlogistic.fit4 <- glm(as.factor(Mpg01) ~ Weight+Year+Origin, data=carslogic, family="binomial")
summary(carlogistic.fit4)
I get the below output: Call: glm(formula = as.factor(Mpg01) ~ Weight + Year + Origin, family = "binomial", data = carslogic)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.29189 -0.10014 -0.00078 0.19699 2.60606
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.697e+01 5.226e+00 -5.161 2.45e-07 ***
Weight -6.006e-03 7.763e-04 -7.737 1.02e-14 ***
Year 5.677e-01 8.440e-02 6.726 1.75e-11 ***
OriginGerman 1.256e+00 5.172e-01 2.428 0.0152 *
OriginJapanese 3.250e-01 5.462e-01 0.595 0.5519
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 549.79 on 396 degrees of freedom
Residual deviance: 151.06 on 392 degrees of freedom
AIC: 161.06
However, if you notice the p-value for Japanese origin cars is greater than 0.05 and hence is insignificant. I want to remove this from the model, however, the column header is Origin as you see in the initial code. How do I exclude Japanese origin specifically from the model?
Origin
is a single term. If you want to do this (I wouldn't recommend it), the workaround is to build the numeric model matrix explicitly (usemodel.matrix()
), which will convert your factor into two numeric dummy variables; use the variables in the model matrix as separate numeric predictors, then you can drop them individually if you want. If you give a reproducible example someone might provide more details ... – Ben Bolker