I have a panel data from year t1
to t2
. Some individuals enter the sample after t1
and/or exit the sample before t2
. For efficiency (large sample), the dataset only contains rows for years when individuals are observed.
I want to add a new observation per individual, containing the year after an individual left the sample. So, if someone left in, say 2003, I want the new observation to contain the individual's id and the value 2004
in the year variable. Every other variable in that observation should be missing.
This is my approach, using a sample dataset:
webuse nlswork, clear
* Here goes plenty of lines of codes modifying the dataset ... for generality *
timer on 1
preserve
keep id year
bysort id (year) : keep if _n == _N
replace year = year + 1
save temp.dta, replace
restore
append using temp.dta
sort id year
erase temp.dta
timer off 1
timer list
I think this might be a bit inefficient, as it includes a preserve/restore, saving/deleting an additional database, and an append, all relatively time-consuming actions. Something like tsfill, last
would be amazing, but that option doesn't exist. Is anyone aware of a more efficient method? The code above includes timer, so anyone can benchmark it against another method.
preserve
/restore
is not necessary strictly speaking. – user8682794