I'm a little confused about how to interpret coefficient in multiple regression with two categorical variables. Use mtcars dataset as an example. According to some online sources and books, the coefficient of one categorical variable is the different of mean between the level and reference level, given the other variable is at reference level. In this example, according to the aggregated result, the coefficient of factor(vs)1 should be 81-91=-10, but it's not. It's -13.92. Those claims seems to be wrong.
Can someone clarify one on this? How to interpret the coefficients in terms of 'mean difference'?
fit <- lm(data=df, hp~factor(vs)+factor(cyl))
Call:
lm(formula = hp ~ factor(vs) + factor(cyl), data = df)
Coefficients:
(Intercept) factor(vs)1 factor(cyl)6 factor(cyl)8
95.29 -13.92 34.95 113.93
# then mean of hp at different levels of vs ans cyl.
aggregate(hp~vs+cyl, df, mean)
0 4 91.0000
1 4 81.8000
0 6 131.6667
1 6 115.2500
0 8 209.2143
My second question is: what if the treat those categorical variable as ordered factors? There will be linear or quadratic term for those factors. But how should I interpret the coefficients?
lm(data=df, hp~factor(vs, ordered=TRUE)+factor(cyl, ordered=TRUE))
Call:
lm(formula = hp ~ factor(vs, ordered = TRUE) + factor(cyl, ordered = TRUE),
data = df)
Coefficients:
(Intercept) factor(vs, ordered = TRUE).L
137.96 -9.84
factor(cyl, ordered = TRUE).L factor(cyl, ordered = TRUE).Q
80.56 17.97
Thank you very much in advance.