0
votes

Coding in Stata:

In my unbalanced weekly panel dataset that spans 5 years, I seek to:

  1. Fill in weeks that are skipped. I am using the tsfill command for this.
  2. However, I don't want to fill in weeks if weeks are missing for more than 5 weeks in a row. That's to say, if weeks are missing for 5 weeks or less, we go ahead and generate those missing weeks with zero values; but if weeks are missing for more than 5 weeks, just ignore it.

The second step constitutes a challenge for me. Any suggestions?

sample original:

id week var1 var2 var3
1   1    0    3    0
1   3    1    0    0 
1   5    1    0    0
1   20   0    4    0

sample desired:

id week var1 var2 var3
1   1    0    3    0
1   2    0    0    0   (new row!)
1   3    1    0    0 
1   4    0    0    0   (new row!)
1   5    1    0    0
1   20   0    4    0
1
How ginormous is your dataset? Might it be manageable to do a complete fill then drop the rows you don't want?Matthew Gunn
Nearly 300,000 obs. I don't think it's feasible.Olga
One approach might be to: (1) Use tsfill to fill in everything then (2) gen DROPME = 1 (3) iterate through by id, time and set DROPME = 0 if there's an non-missing observation within a 5 distance neighborhood. (4) Drop every row with dropme = 1.Matthew Gunn
Thank you Matthew. Can you please write it as the answer? I have a bit hard time understanding the notations, especially step 3.Olga
I'm not really a Stata coder... I'd have to be looking up syntax everywhere. I'm just thinking logically through it, the two options you really have are : (1) Use TSFILL and drop rows you don't want or (2) essentially write your own version of TSFILLMatthew Gunn

1 Answers

1
votes

I think I found the answer.

iri_key week    units
1   1   2
1   3   3
1   4   5
1   6   7
1   15  2
2   1   5
2   2   7
2   3   3
2   4   6
2   6   4


tsset iri_key week
tsfill, full
replace units=0 if units==.
gen check=0
replace check=1 if units==0
tsspell, cond(check==1)
drop if _seq>5