0
votes

I have a categorical independent variable (with options of "yes" or "no") that I want to add to my panel linear model. According to the answer here: After generating dummy variables?, the lm function automatically creates dummy variables for you for categorical variables.

Does this mean that creating dummy variables through i.e. dummy.data.frame is unnecessary, and I can just add in my variable in the plm function and it will automatically be treated like a dummy variable (even if the data is not numerical)? And is this the same for the plm function?

Also, I don't have much data to begin with. Would it hurt if I manually turned the data into numbers (i.e. "yes"=1, "no"=0) without creating a dummy variable?

1
What data type are the answers? You can check it by writing str(df). If they are char, you will need to transform them. If they are factor variables, lm() will handle it for you. - Roman
@Roman They're factor variables. Thanks! - user10831611
If they are not you could use factor() inside the model, e.g. lm(y ~ x + factor(dummy), data). - jay.sf
@jay.sf Thank you—will take note! - user10831611

1 Answers

0
votes

It is unnecessary to create dummy variables for use with the lm() function. To illustrate, we'll run a regression model on the mtcars data set, using am (0 = automatic, 1 = manual transmission) as a factor variable.

summary(lm(mpg ~ wt + factor(am),data=mtcars))

...and the output:

> summary(lm(mpg ~ wt + factor(am),data=mtcars))

Call:
lm(formula = mpg ~ wt + factor(am), data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5295 -2.3619 -0.1317  1.4025  6.8782 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.32155    3.05464  12.218 5.84e-13 ***
wt          -5.35281    0.78824  -6.791 1.87e-07 ***
factor(am)1 -0.02362    1.54565  -0.015    0.988    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.098 on 29 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7358 
F-statistic: 44.17 on 2 and 29 DF,  p-value: 1.579e-09