0
votes

I am coding in SAS Enterprise Guide 4.2. I am trying to calculate the Quandt's log likelihood ratio. But it is not important to understand that to understand my question.

The ratio is based on sequential regressions. Namely regressions from 1 to t0 where 1<=t0<=T and T is the samplesize. Illustration:

  • First perform regression on the first observation
  • Then perform regression on the first two observations
  • Then perform regression on the first 3 observations
  • ...and so on

It is also performing a "forward regression" from t0+1 to T. Illustration:

  • First perform regression on the last T-1 observations
  • Then perform regression on the last T-2 observations
  • Then perform regression on the last T-3 observations
  • ...and so on

The regression is an Ordinary Least Squares regression.

After the regression is performed, the square of the residuals are summed.

So this is what I need. For each observation t0 I want to:

  • do an OLS regression from 1 to t0 and sum up the square of the residuals
  • do an OLS regression from t0+1 to T and sum up the square of the residuals

The data consists of one group variable, one dependent variable and one independent variable. The calculations should be performed grouped by the group variable (but that should'nt be too difficult).

I have been able to do part of this task myself, but it is horribly ineffeicient and since the data consists of over 1,000,000,000 observations efficiency is very important

I have also noticed that the procedure "autoreg" calculates the CUSUM statistic that is also based on sequential regression and therefore I suspect that this functionality could be availible in SAS but I haven't been able to find it.

And the part I am struggling with most right now is the summation. Simple example of the summation I want to do:

Input:
col1 col2
1     2
2     5
5     4
7     6

Output:
col3
2    =1*2
15   =1*5+2*5
32   =1*4+2*4+5*4
90   =1*6+2*6+5*6+7*6

Has anyone encounter a similar problem or have any idea on how to solve it in an efficient way? All help is welcome and feel free to ask me to clarify something if it is unclear.

1

1 Answers

0
votes

As far as the summation goes, the below should work (though your input dataset must be sorted by group first).

Since the summation you're asking for is basically col2 multiplied by the cumulative sum of col1 within each group, you can use a retain statement to keep track of the sum of col1, and by-group processing to reset the cumulative sum each time the data step encounters a new group.

data output;
  retain cusum;
  set input;
  by group;
  if first.group then cusum = col1;
  else cusum = cusum + col1;
  col3 = cusum * col2;
  drop cusum;
run;