1
votes

I have a panel data from year t1 to t2. Some individuals enter the sample after t1 and/or exit the sample before t2. For efficiency (large sample), the dataset only contains rows for years when individuals are observed.

I want to add a new observation per individual, containing the year after an individual left the sample. So, if someone left in, say 2003, I want the new observation to contain the individual's id and the value 2004 in the year variable. Every other variable in that observation should be missing.

This is my approach, using a sample dataset:

webuse nlswork, clear

* Here goes plenty of lines of codes modifying the dataset ... for generality *

timer on 1

preserve
keep id year
bysort id (year) : keep if _n == _N
replace year = year + 1
save temp.dta, replace
restore

append using temp.dta
sort id year
erase temp.dta

timer off 1
timer list 

I think this might be a bit inefficient, as it includes a preserve/restore, saving/deleting an additional database, and an append, all relatively time-consuming actions. Something like tsfill, last would be amazing, but that option doesn't exist. Is anyone aware of a more efficient method? The code above includes timer, so anyone can benchmark it against another method.

1
The preserve / restore is not necessary strictly speaking.user8682794
@PearlySpencer you mean by changing the order of dataset loading? That's the case in this example, but in reality I do many stuff with the dataset, before.luchonacho
@NickCox updatedluchonacho

1 Answers

1
votes

I am never that impressed by attempts to save seconds when coding takes minutes. This is more direct than your approach.

bysort id (year) : gen byte last = _n == _N 
expand 2 if last 
bysort id (year) : replace year = year + 1 if _n == _N 

EDIT: You need to loop over the other variables in your dataset to replace their values with missing. For simplicity, I will assume that they are all numeric.

bysort id (year) : replace last = _n == _N 
ds id year, not 
quietly foreach v in `r(varlist)' { 
    replace `v' = . if last 
}