0
votes

I have an unbalanced panel in Stata with 3 years (2006, 2008 and 2010) and 13,768 IDs. I executed the command xtdescribe and got:

     Freq.  Percent    Cum. |  Pattern*

---------------------------+---------- 8265 49.80 49.80 | 111 2672 16.10 65.90 | 1.. 2241 13.50 79.40 | 11. 1779 10.72 90.12 | ..1 923 5.56 95.69 | .11 413 2.49 98.17 | .1. 303 1.83 100.00 | 1.1 ---------------------------+---------- 16596 100.00 | XXX --------------------------------------

*Each column represents 2 periods.

I want to keep just IDs that have observations for all three years. I tried to implement a simple command like

keep pid if syear == 2006 & syear == 2008 & syear == 2010

and

keep if syear == 2006 & syear == 2008 & syear == 2010

but they are just wrong because in first case the syntax is invalid and in the second case I just deleted all observations.

How can I keep observations only for IDs that have observations through the whole time period (for three years 2006, 2008 and 2010)?

1

1 Answers

0
votes

When Stata tests conditions it tests them observation by observation. It does not look at other observations unless you force that by using subscript syntax. Now the condition

syear == 2006 & syear == 2008 & syear == 2010

is never true, because it is asking that syear be 3 different values in the same observation. So nothing is kept and everything is dropped. It is like saying

my age is 29 and my age is 31 and my age is 33

which (regardless of your age) is not true at the same instant.

In your case, this would work

bysort pid : keep if _N == 3 

This is closer to the logic you tried:

bysort pid : egen npanel = total(inlist(syear, 2006, 2008, 2010)) 
keep if npanel == 3 

Or you could go

bysort pid : egen npanel = total(syear == 2006 | syear == 2008 | syear == 2010) 

For the problem you describe, the first solution is best, but the technique in the other two can help in other situations.