I normally work within a generalized least squares framework estimating, what Wooldridge's Introductory (2013) calls, Random Effects and Fixed Effects models on longitudinal data indexed by an individual and a time dimension.
I've been using the Feasible GLS estimation in plm()
, from the plm package, to estimate the Random Effects Model – what some stats literature term the Mixed Model. The plm()
function takes an index
argument where I indicate the individual and time indexes. However, I’m now faced with some data where each individual has several measures at each time-point, i.e. what a group-wise structure.
I’ve found out that it’s possible to set up such a model using lmer()
from the lme4 package, however I am a bit confused by the differences in jargon, and also the likelihood framework, and I wanted to know if specified the model correctly. I fear I could overlook at more substantial as I am not familiar with the framework and this terminology.
I can replicate my usual plm()
model using lmer()
, but I am unsure as to how I could add the grouping. I’ve tried to illustrate my question in the following.
I found some data that looks somewhat like my data to illustrate my situation. First some packages that are needed,
install.packages(c("mlmRev", "plm", "lme4", "stargazer"), dependencies = TRUE)
and then the data
data(egsingle, package = "mlmRev")
egsingle
is a unbalanced panel consisting of 1721 school children, grouped in 60 schools, across five time points. These data are originally distributed with the HLM software package (Bryk, Raudenbush and Congdon, 1996), but can be found the mlmrev package, for details see ? mlmRev::egsingle
Some light data management
dta <- egsingle
dta$Female <- with(dta, ifelse(female == 'Female', 1, 0))
Here’s a snippet for the data
dta[118:127,c('schoolid','childid','math','year','size','Female')]
#> schoolid childid math year size Female
#> 118 2040 289970511 -1.830 -1.5 502 1
#> 119 2040 289970511 -1.185 -0.5 502 1
#> 120 2040 289970511 0.852 0.5 502 1
#> 121 2040 289970511 0.573 1.5 502 1
#> 122 2040 289970511 1.736 2.5 502 1
#> 123 2040 292772811 -3.144 -1.5 502 0
#> 124 2040 292772811 -2.097 -0.5 502 0
#> 125 2040 292772811 -0.316 0.5 502 0
#> 126 2040 293550291 -2.097 -1.5 502 0
#> 127 2040 293550291 -1.314 -0.5 502 0
Here’s how I would set a random effects model without the schoolid
using plm()
,
library(plm)
reg.re.plm <- plm(math~Female+size+year, dta, index = c("childid", "year"), model="random")
# summary(reg.re.plm)
I can reproduce these results lme4
like this
require(lme4)
dta$year <- as.factor(dta$year)
reg.re.lmer <- lmer(math~Female+size+year+(1|childid), dta)
# summary(reg.re.lmer)
Now, from reading chapter 2 in Bates (2010) “lme4: Mixed-effects modeling
with R” I believe I’ve this is how I would specific the model including the cluster level, schoolid
,
reg.re.lmer.in.school <- lmer(math~Female+size+year+(1|childid)+(1|schoolid), dta)
# summary(reg.re.lmer.in.school)
However, when I look at the results I am not too convinced I’ve actually specified it correctly (see below).
In my actual data the repeated measures are within individuals, but I take that I can use this data as example. I would appreciate any advice on how to proceed. Maybe a reference to a worked example with notation/terminology not too far from what is used in Wooldridge (2013). And, how do I work backwards and write up the specification for the reg.re.lmer.in.school
model?
# library(stargazer)
stargazer::stargazer(reg.re.plm, reg.re.lmer, reg.re.lmer.in.school, type="text")
#> =====================================================================
#> Dependent variable:
#> -------------------------------------------------
#> math
#> panel linear
#> linear mixed-effects
#> (1) (2) (3)
#> ---------------------------------------------------------------------
#> Female -0.025 -0.025 0.008
#> (0.046) (0.047) (0.042)
#>
#> size -0.0004*** -0.0004*** -0.0003
#> (0.0001) (0.0001) (0.0002)
#>
#> year-1.5 0.878*** 0.876*** 0.866***
#> (0.059) (0.059) (0.059)
#>
#> year-0.5 1.882*** 1.880*** 1.870***
#> (0.059) (0.058) (0.058)
#>
#> year0.5 2.575*** 2.574*** 2.562***
#> (0.059) (0.059) (0.059)
#>
#> year1.5 3.149*** 3.147*** 3.133***
#> (0.060) (0.059) (0.059)
#>
#> year2.5 3.956*** 3.954*** 3.939***
#> (0.060) (0.060) (0.060)
#>
#> Constant -2.671*** -2.669*** -2.693***
#> (0.085) (0.086) (0.152)
#>
#> ---------------------------------------------------------------------
#> Observations 7,230 7,230 7,230
#> R2 0.735
#> Adjusted R2 0.735
#> Log Likelihood -8,417.815 -8,284.357
#> Akaike Inf. Crit. 16,855.630 16,590.720
#> Bayesian Inf. Crit. 16,924.490 16,666.460
#> F Statistic 2,865.391*** (df = 7; 7222)
#> =====================================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01