Rolling standard deviation with more than 3 observations in Stata

Question

I am trying to calculate the rolling standard deviation for some economic variable (let's call it X) over 10 previous years. I work with a panel data set, which contains 500 firms (id1-id500) and 20 years (1995-2015). I have to calculate the standard deviation for each year and each firm starting from the year 2006 on the rolling basis (rolling window over 10 years). I've manged to do this with the following code:

tsset id year
rolling sd_X=r(sd), window(10) keep(year) saving(sd_X, replace): sum X
merge 1:1 id year using "sd_X.dta"

This code works fine, but I have a small problem: X variable contains a lot of missing values and I need more than 3 observations for X variable to calculate standard deviations. My solution was to calculate standard deviations using -rolling- (as I showed before) and than replace all standard deviations with missing values, if there were calculated using less than 3 observations.

I tried to write a loop, which counts all non-missing values in the period of 10 previous years for each firm and year using rolling window, but unfortunately it didn't work well:

   gen count1 = 0
   forvalues i = 1995/2015 {
   forvalue z = 1/500 {
   count if year == `i' & id == `z' & X != . 
   replace count1 = count1[_n-1] + r(N)
    }
    }

As a result, I've got only missing values for count1. The results, which I want to get, has to look like this:

 firm   year     X      count1
  z      i

 id1   1995    m.v.       0
 ........................
 id1   2011    m.v.       0
 id1   2012     1         0
 id1   2013     1         1
 id1   2014     21        2
 id1   2015     4         3
 ........................
 id2   1995     m.v.      0
 ........................
 id2   2011     m.v.      0
 id2   2012     m.v.      0
 id2   2013     m.v.      0
 id2   2014      42       1
 id2   2015      46       2
 and so on...

Does anybody know nice way to do this?

You need a -replace- in the command to change -count1-. Your examples are not consistent on the names or values of identifier variables, so it's hard to follow exactly what you did. — Nick Cox
Thank you, you are right. I forgot to add replace. I've edited my question as well and I hope that now it is more clear. Robert Picard has suggested a very nice solution for my problem, I think I just use -tsegen-. — In777

Robert Picard Robert Picard · Accepted Answer · 2015-09-10T14:12:35

Take a look at tsegen (from SSC). To install it, type in Stata's command window

ssc install tsegen

Here's a quick example of a rolling sd over an 11 year window that includes the current observation. A minimum of 3 non-missing observations is requested.

* ------------- data setup -------------
set seed 1234
clear

* create firms and initial earnings
set obs 500
gen id = _n
gen x = runiform()

* for each firm, create 20 years of x
expand 20
bysort id: gen year = 1985 + _n
replace x = x + runiform()

* create some missing  values
replace x = . if runiform() < .1

* can also have some missing obs
drop if runiform() < .1

* Declare data to be panel data
tsset id year

* ------------- end of data setup -------------

tsegen sd_x = rowsd(L(0/10).x, 3)

* spot check
sum x if id == 1 & inrange(year,1993,2003)
list if id == 1 & year == 2003

Rolling standard deviation with more than 3 observations in Stata

1 Answers