I am currently trying to get used to the plm package and tried to make a fixed effects with individual effects (just for the sake of doing it, please ignore the misspecification) with the plm() function and then with the lm() function. I found that I can only replicate the results of the plm() regression when I include a dummy variable for EACH individual N in the lm() regression. As far as I know, there should always be N-1 dummy variables included in the regression only. Does anyone know how plm handles the individual fixed effects? The same is true for time fixed effects btw.
Here is my code using example data from Grunwald 1958 (included in the plm package as well), please excuse the rather clumsy dummy variable creation:
################################################################################
## Fixed Effects Estimation with plm() and lm() with individual effects
################################################################################
# Prepare R sheet
library(plm)
library(dplyr)
################################################################################
# Get data
data<-read.csv("http://people.stern.nyu.edu/wgreene/Econometrics/grunfeld.csv")
class(data)
data.tbl<-as.tbl(data)
#I = Investment
#F = Real Value of the Firm
#C = Real Value of the Firm's Capital Stock
################################################################################
# create firm (individual) dummies
firmdum<-rbind(matrix(rep(c(1,0,0,0,0,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,1,0,0,0,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,1,0,0,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,1,0,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,1,0,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,1,0,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,0,1,0,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,0,0,1,0,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,0,0,0,1,0),20),ncol = 10,byrow = T),
matrix(rep(c(0,0,0,0,0,0,0,0,0,1),20),ncol = 10,byrow = T)
)
colnames(firmdum)<-paste("firm",c(1:10),sep = "")
firmdum.tbl<-tbl_df(firmdum)
firmdum.tbl<-sapply(firmdum.tbl, as.integer)
###############################################################################################
# Estimation with individual fixed effects (plm)
dataset<-tbl_df(cbind(data.tbl,firmdum.tbl))
est1<- plm(I ~ F + C, data = dataset, model = "within", effect = "individual")
summary(est1)
plot(residuals(est1))
# Replication with lm
individualeffects<-tbl_df(cbind(data.tbl,firmdum.tbl))
est2<-lm(I ~ . -1 -FIRM -YEAR, individualeffects)
summary(est2)
plot(residuals(est2))
# Now exclude 1 dummy (as should be done in fixed effects)
individualeffects<-tbl_df(cbind(data.tbl,firmdum.tbl))
est3<-lm(I ~ . -1 -FIRM -YEAR -firm1, individualeffects)
summary(est3)
plot(residuals(est3))
The difference is marginal, but it would be interesting to know how the plm function handles fixed effects. I ran into a problem when it came to running tests on a model, which did not arise when I did the fixed effects estimation with the lm() package excluding one year and one individual dummy. I'd appreciate any help or recommendations!