1
votes

I am using Stata and have panel data with two periods, t1 and t2. I also have a unique identifier that is constant across periods if the person has responded over both periods.

For example, if person001 completes survey in both t1 and t2, answers to each period are stored under identifier person001. This leads to two entries in the data set with the same identifier, one under t1 and one under t2.

However, some people have only completed the survey in one period, so their identifier appears only in either t1 or t2.

I would like a way to drop those that only appear in one period or the other.

I have tried

drop if identifier[_n-1] != identifier if period == t2 

but this simply drops all t1 observations.

1
Note that your statement is illegal. The second if should presumably be &. - Nick Cox

1 Answers

2
votes

If your data is in long format, try

bysort identifier: drop if _N==1

or

bysort identifier: keep if _N==2

This sorts the data by the id variable. _N is a system variable that keeps track of the number of observations. The by part of the bysort prefix calculates the number of observations for each value of the id, rather than for the data as a whole. Then you get rid of those observations that only appear once or keep those that appear twice. These actions are equivalent.