4
votes

I'm working on a state and year fixed effects regression, which has 3 observations per state/year combo based on the race for that row (white, black, other) - See link below.
So far, I've been using the base lm function to estimate a fixed effects regression that accounts for all three races. I do this by using state, year and race all as factor variables. I am also running separate regressions for each individual race. The problem is that I would prefer to use the plm package so that i can get the within r-squared for the model with all races, however it is giving me errors.

Edit: I included a picture of my data here the data is a balanced panel, there are 34 states, 12 years (2003-2014) and 3 races for each state/year combo so a total of 1244 observations.

Here is the code I'm using to run the plm regression:

#plm regression
plm.reg <- plm(drugcrime_ar ~ decrim_dummy + median_income + factor(race),
               data = my.data, index=c("st_name","year"), model = "within",
               effect = "twoways")

The errors I get in return:

Error in pdim.default(index[[1]], index[[2]]): 
   duplicate couples (id-time) 
In addition: Warning messages: 
1: In pdata.frame(data, index) :
   duplicate couples (id-time) in resulting pdata.frame
   to find out which, use e.g. table(index(your_pdataframe), useNA = "ifany"
2: In is.pbalanced.default(index[[1]], index[[2]]) :
   duplicate couples (id-time)
 3: In is.pbalanced.default(index[[1]], index[[2]]) :
   duplicate couples (id-time)  ` 

Is there a workaround for this or am I out of luck?

1
Could you put a reproducible example? e.g.: stackoverflow.com/questions/5963269/…Edgar Santos
Show the layout of your data and how you create the pdata.frame and the estimation.Helix123
I edited my post and added the information you requesteddmunslow
It seems to me, you actually have some kind of nested panel structure. The development version of plm implements the nested model as in Baltagi/Song/Jung (2001) but I do not know if it is suitable for your situation.Helix123

1 Answers

5
votes

The plm function needs just one pair of id/time. For each id you supplied you have more than one year.

If each st_name and race pairs form an "individual" (or whatever the name you give to this dimension of the panel), then you could do:

library(dplyr)

my.data$id <- group_indices(my.data, st_name, race)    
#which would be the same as my.data <- my.data %>% mutate(id = group_indices(st_name, race)), if this function supported mutate. 

plm.reg <- plm(drugcrime_ar ~ decrim_dummy + median_income + factor(race),
           data = my.data, index=c("id","year"), model = "within",
           effect = "twoways")

See, however, that in this situation you are not using a kind of nested panel structure as @Helix123 suggested. You are only redefining the first dimension of the panel.