Edit : I am working on a solution and have been quite successful so far. I will post it for future reference as soon as I have it completed and fully running.
I am studying stock returns and have been thinking about this problem for some time now. I am still new to SAS, which makes this particularly challenging for me.
I have the following dataset (simplified).
PERMNO DATE RETURN
10078 1995JAN02 0.012000
10104 1995JAN02 0.050000
10107 1995JAN02 -0.019000
10078 1995JAN03 0.002000
10104 1995JAN03 0.004000
10107 1995JAN03 -0.009000
...
10078 1996JAN02 -0.011000
10104 1996JAN02 -0.020000
10107 1996JAN02 0.010000
...
10078 2013JAN03 0.050000
10104 2013JAN03 0.012000
10107 2013JAN03 -0.034000
PERMNO
identifies a stock, DATE
identifies a date (yyyymmmdd), RETURN
is the daily stock return.
I have simplified this example for only three stocks (10078, 10104, 10107).
My goal is to have daily correlation matrices between the three different stocks with a one year observation window (252 past observations, roughly) outputted to daily files.
Given my dataset, this should start in 1996JAN03
(roughly). So, for the first day it should calculate the correlation coefficient (so PROC CORR) between all the observations between 1995JAN02
and 1996JAN02
of all the different PERMNO
. i.e. Then I should be able to output the following matrix
10078 10104 10107
10078 1 0.1 0.3
10104 0.1 1 0.2
10107 0.3 0.2 1
And for the next day, 1996JAN04
, I should be able to output the same matrix using the observations between 1995JAN03
and 1996JAN03
. It is important that the final procedure is able to handle non-matching data sets in DATE
. i.e. If one stock only has prices after 1995JAN05
, then it should only use those dates, and if one of the stocks does not have a record for 1995SET05
, it should be able to ignore that observations and use only the remaining "matching" ones.
My question is quite similar to this one Calculating rolling correlations in SAS but my data is not organized as a matrix. I am quite tempted to apply the suggested approach (still stuyding it) but I would have to "convert" my data into a matrix - if that would be the most efficient way, I am willing to do it (but I would need help as I really do not know how to, although I can describe the intended output).
(Feel free to skip the following part if you have a solution) What I have thought to tackle this problem (but have not yet translated to SAS code) is (assuming n stocks):
- Count number of unique
PERMNO
(possible through PROC SQL); - Define a starting date (
1996JAN02
); - Create a macro that does the following steps (4-8):
- For each
PERMNO
, obtain an list of dates for that PERMNO after the starting date (possible through PROC SQL); - For each
DATE
in step 4, iterate through the remaining stocks (since the output is a triangular matrix, I could go to the n - 1 - j remaining stocks, where j is the current stock from 1 to n) and create a table with return1 and return2 with matching dates between the (current date - 1) and (current date - 253 observations); - Calculate PROC CORR for the table created in step 5.
- Save this output (somehow) to a dataset of that specific day and keep doing this until I have exhausted all days for this Stock after the starting date;
- Go on to the next
PERMNO
. - Write the data sets through PROC EXPORT.
However, this is very much MATLAB-y or VBA-y, which are the languages I am familiar with, and certainly not the most efficient approach in SAS.
I thank you in advance for your help.
PROC SETINIT; RUN;
and see if it lists SAS/ETS. – Joe