1
votes

I am trying to construct centered moving average in SAS. my table is in below

   date       number   average
01/01/2015     18       ...
01/01/2015     15       ...
01/01/2015     5        ...
02/01/2015     66       ...
02/01/2015     7        ...
03/01/2015     7        ...
04/01/2015     19       ...
04/01/2015     7        ...
04/01/2015     11       ... 
04/01/2015     17       ...
05/01/2015     3        ...
06/01/2015     7        ...
   ...        ...       ...

I need to obtain the average number for a surrounding period over (-2,+2) days, instead of (-2,+2) observations

I know that for Centered Moving Average, I can use.

 convert number=av_number/transformout=(cmovave 3)

but here we have different number of observations in each day.

Anyone can tell me how to include only (-2, +2) days of centered moving average in this case ?

Thanks in advance !

Best

1
I think you should convert it first to daily observations (or monthly or whatever), then convert to the centered moving average.Joe
@joe thanks. Did you mean i should aggregate the daily value first, and then work out the average ?Tristan Sun
I assume you're using PROC EXPAND; it can do both things (aggregate or expand the observations to one per month, and then create the centered moving average). I'm not an expert in that area so I'm not sure if you can do it at once or not.Joe

1 Answers

2
votes

The suggestion from @Joe to aggregate to a daily level is the right approach, however you have to be careful that you don't lose the number of entries per day, otherwise you won't calculate the correct moving average. In other words, you need to weight the daily value by the number of entries for that day.

I've taken 3 steps to calculate the moving average, it may be possible to do it in 2 but I can't see how.

Step 1 is to calculate the sum and count of number per day.

Step 2 is to calculate the moving 5 day sum for both variables.

Step 3 then divides the sum by the count to get the weighted 5 day average.

I've added the trim function to exclude the first and last 2 records, obviously you can include those if you wish. You'll probably want to drop some of the extra variables as well.

data have;
input date :ddmmyy10. number;
format date date9.;
datalines;
01/01/2015  18
01/01/2015  15
01/01/2015  5
02/01/2015  66
02/01/2015  7
03/01/2015  7
04/01/2015  19
04/01/2015  7
04/01/2015  11
04/01/2015  17
05/01/2015  3
06/01/2015  7
;
run;

proc summary data=have nway;
class date;
var number;
output out=daily_agg sum=;
run;

proc expand data=daily_agg out=daily_agg_mov_sum;
convert number=tot_number / transformout = (cmovsum 5 trim 2);
convert _freq_=tot_count / transformout = (cmovsum 5 trim 2);
run;

data want;
set daily_agg_mov_sum;
if not missing(tot_number) then av_number = tot_number / tot_count;
run;