2
votes

I am trying to run regressions in R (multiple models - poisson, binomial and continuous) that include fixed effects of groups (e.g. schools) to adjust for general group-level differences (essentially demeaning by group) and that cluster standard errors to account for the nesting of participants in the groups. I am also running these over imputed data frames (created with mice). It seems that different disciplines use the phrase ‘fixed effects’ differently so I am having a hard time searching to troubleshoot.

I have fit random intercept models (with lme4) but they do not account for the school fixed effects (and the random effects are not of interest to my research questions). Putting the groups in as dummies slows the run down tremendously. I could also run a single level glm/lm with group dummies but I have not been able to find a strategy to cluster the standard errors with the imputed data (tried the clusterSE package). I could hand calculate the demeaning but there seems like there should be a more direct way to achieve this.


I have also looked at the lfe package but that does not seem to have glm options and the demeanlist function does not seem to be compatible with the imputed data frames.

In Stata, the command would be xtreg, fe vce (Cluster Variable), (fe = fixed effects, vce = clustered standard errors, with mi added to run over imputed dataframes). I could switch to Stata for the modeling but would definitely prefer to stay with R if possible!

Please let me know if this is better posted in cross-validated - I was on the fence but went with this one since it seemed to be more a coding question.

Thank you!

1
Look into the plm package to run FE panel model, then run in lm.test package's coeftest() to estimate with standard errors. See this Cross Validated post.Parfait
Thanks for the post - I had looked at the plm package but since I did not have panel data I thought it wasn't the right fit. The comments on that post were helpful for using the plm regardless.Elizabeth H

1 Answers

0
votes

I would block bootstrap. The "block" handles the clustering and "bootstrap" handles the generated regressors.

There is probably a more elegant way to make this extensible to other estimators, but this should get you started.

# junk data
x <- rnorm(100)
y <- 1 + 2*x + rnorm(100)
dat1 <- data.frame(y, x, id=seq_along(y))
summary(lm(y ~ x, data=dat1))

# same point estimates, but lower SEs
dat2 <- dat1[rep(seq_along(y), each=10), ]
summary(lm(y ~ x, data=dat2))

# block boostrap helper function
require(boot)
myStatistic <- function(ids, i) {
  myData <- do.call(rbind, lapply(i, function(i) dat2[dat2$id==ids[i], ]))
  myLm <- lm(y ~ x, data=myData)
  myLm$coefficients
}

# same point estimates from helper function if original data
myStatistic(unique(dat2$id), 1:100)


# block bootstrap recovers correct SEs
boot(unique(dat2$id), myStatistic, 500)