This is a follow up question to another question of mine, Efficiently fitting cubic splines in SAS to specific grid of objects - the previous solution worked, but was not nearly as efficient as I imagine they can be.
The purpose of this code is to interpolate values of an image IV for a specific range of an object variable MNES, by groups, and perform operations on the resulting values.
My data-set is as follows:
MEMBID DATE MNES IV
1 13152 0.84 0.40
1 13152 0.89 0.34
1 13152 0.91 0.31
1 13152 0.93 0.29
1 13152 0.95 0.26
1 13152 0.98 0.23
1 13152 0.99 0.22
1 13152 1.00 0.22
1 13152 1.02 0.20
1 13152 1.04 0.18
1 13152 1.07 0.18
...
4 13172 0.89 0.24
4 13172 0.94 0.20
4 13172 0.97 0.19
4 13172 1.01 0.19
4 13172 1.04 0.20
4 13172 1.07 0.21
4 13172 1.15 0.23
4 13172 1.17 0.24
...
Each MEMBID and DATE combination is a by-group. For perspective, I should have roughly two million by-groups.
Each of these by-groups should be used to estimate IV for a grid of 1001 MNES, through fitting a cubic splines to the existing IV-MNES pairs. This grid goes from 1/3 to 3 and is not linear, but rather estimated as follows (this has to do with the methodology I am applying - graphically):
%let m = 500;
%let k = 2;
%let u = (1+&k) ** (1/&m);
data splined(keep=mnes);
do grid=-500 to 500;
mnes = (&u) ** grid;
end;
run;
Each of the by groups are then synthesized to a single figure, by calculating prices of options for each row and summing a certain function of that by group. Given the volume of the resulting data set (i.e. 1001 grid objects x 2 million by groups = 2 billion rows), I do not know if it is efficient to do to this all at once - but looping also does not seem smart.
I have researched this time and again, and from what I gather, there are several built-in tools to estimate cubic splines in SAS:
- BASE SAS has this implemented through the MSPLINT function. While I have implemented this (see previous post, optional) it requires me to use a great amount of macro loops, as it requires me to individually specify the
MNESandIVof each by group in a macro variable, through the use of a comma separated list. Doing 2 million iterations in loops is far from a good idea - it is taking far too long in my current implementation; - SAS/IML allows this through the splinec and splinev functions and I have succesfully implemented this (see previous post, optional) but IML does not seem ideal to deal with 2 million by groups;
- SAS/ETS allows cubic splines to be fit through PROC EXPAND - however, PROC EXPAND requires a time series and integer values for the ID variable. This is wholly incompatible with my data, considering MNES is decimal and needs to have that specific number of steps (cannot be approximated to an integer);
- SAS/STAT allows cubic splines to be fit through PROC TRANSREG (this is my last resort but it also seems the most promising one), by using the available spline transformation. It even supports by group processing. It outputs very good looking plots, but I am unable to understand how I can get an output dataset for the specific grid I need. Alternatively, I would like to obtain the parameters of the spline and estimate the grid "manually" - but this seems impossible. If I only fit a third degree polinomial, then I can get the parameters (through the out= coef option), but not if I want an actual spline.
If you are unaware of how a cubic spline works, this paper explains it in quite simple terms.
I have researched this thoroughly and I do not have an answer. I appreciate any suggestion on how to automate this process efficiently. Thank you.
Following the comment by @Joe, here are have and want data sets for 5 by groups. I have simplified the grid parameters, to only have m = 5, thus 11 points per by group. (Note that I took away some decimal places in have and want to improve presentation).
have
membid date days iv mnes
-1 04JAN1996 365 0.15 0.91
-1 04JAN1996 365 0.14 0.94
-1 04JAN1996 365 0.14 0.96
-1 04JAN1996 365 0.13 0.98
-1 04JAN1996 365 0.13 1.00
-1 04JAN1996 365 0.13 1.02
-1 04JAN1996 365 0.12 1.04
-1 04JAN1996 365 0.12 1.06
-1 04JAN1996 365 0.12 1.08
-1 04JAN1996 365 0.11 1.09
-1 04JAN1996 365 0.11 1.11
-1 04JAN1996 365 0.11 1.13
-1 05JAN1996 365 0.15 0.91
-1 05JAN1996 365 0.14 0.94
-1 05JAN1996 365 0.14 0.96
-1 05JAN1996 365 0.13 0.98
-1 05JAN1996 365 0.13 1.00
-1 05JAN1996 365 0.13 1.02
-1 05JAN1996 365 0.12 1.04
-1 05JAN1996 365 0.12 1.06
-1 05JAN1996 365 0.12 1.08
-1 05JAN1996 365 0.11 1.09
-1 05JAN1996 365 0.11 1.11
-1 05JAN1996 365 0.11 1.13
-1 08JAN1996 365 0.14 0.91
-1 08JAN1996 365 0.14 0.94
-1 08JAN1996 365 0.13 0.96
-1 08JAN1996 365 0.13 0.98
-1 08JAN1996 365 0.12 1.00
-1 08JAN1996 365 0.12 1.02
-1 08JAN1996 365 0.13 1.04
-1 08JAN1996 365 0.12 1.06
-1 08JAN1996 365 0.12 1.08
-1 08JAN1996 365 0.12 1.10
-1 08JAN1996 365 0.11 1.11
-1 08JAN1996 365 0.11 1.13
-1 09JAN1996 365 0.15 0.91
-1 09JAN1996 365 0.14 0.94
-1 09JAN1996 365 0.14 0.96
-1 09JAN1996 365 0.14 0.98
-1 09JAN1996 365 0.13 1.00
-1 09JAN1996 365 0.13 1.02
-1 09JAN1996 365 0.12 1.04
-1 09JAN1996 365 0.12 1.06
-1 09JAN1996 365 0.12 1.08
-1 09JAN1996 365 0.11 1.09
-1 09JAN1996 365 0.11 1.11
-1 09JAN1996 365 0.11 1.13
-1 10JAN1996 365 0.15 0.91
-1 10JAN1996 365 0.15 0.94
-1 10JAN1996 365 0.14 0.96
-1 10JAN1996 365 0.14 0.98
-1 10JAN1996 365 0.13 1.00
-1 10JAN1996 365 0.13 1.02
-1 10JAN1996 365 0.13 1.05
-1 10JAN1996 365 0.12 1.06
-1 10JAN1996 365 0.12 1.08
-1 10JAN1996 365 0.12 1.10
-1 10JAN1996 365 0.12 1.12
-1 10JAN1996 365 0.11 1.14
want
grid membid date days mnes iv
-5 -1 13152 365 0.3333 0.15207
-4 -1 13152 365 0.4152 0.15207
-3 -1 13152 365 0.5172 0.15207
-2 -1 13152 365 0.6443 0.15207
-1 -1 13152 365 0.8027 0.15207
0 -1 13152 365 1 0.13631
1 -1 13152 365 1.2457 0.11354
2 -1 13152 365 1.5518 0.11354
3 -1 13152 365 1.9331 0.11354
4 -1 13152 365 2.4082 0.11354
5 -1 13152 365 3 0.11354
-5 -1 13153 365 0.3333 0.15217
-4 -1 13153 365 0.4152 0.15217
-3 -1 13153 365 0.5172 0.15217
-2 -1 13153 365 0.6443 0.15217
-1 -1 13153 365 0.8027 0.15217
0 -1 13153 365 1 0.135057
1 -1 13153 365 1.2457 0.115
2 -1 13153 365 1.5518 0.115
3 -1 13153 365 1.9331 0.115
4 -1 13153 365 2.4082 0.115
5 -1 13153 365 3 0.115
-5 -1 13156 365 0.3333 0.14731
-4 -1 13156 365 0.4152 0.14731
-3 -1 13156 365 0.5172 0.14731
-2 -1 13156 365 0.6443 0.14731
-1 -1 13156 365 0.8027 0.14731
0 -1 13156 365 1 0.131243
1 -1 13156 365 1.2457 0.11656
2 -1 13156 365 1.5518 0.11656
3 -1 13156 365 1.9331 0.11656
4 -1 13156 365 2.4082 0.11656
5 -1 13156 365 3 0.11656
-5 -1 13157 365 0.3333 0.15556
-4 -1 13157 365 0.4152 0.15556
-3 -1 13157 365 0.5172 0.15556
-2 -1 13157 365 0.6443 0.15556
-1 -1 13157 365 0.8027 0.15556
0 -1 13157 365 1 0.13881
1 -1 13157 365 1.2457 0.11447
2 -1 13157 365 1.5518 0.11447
3 -1 13157 365 1.9331 0.11447
4 -1 13157 365 2.4082 0.11447
5 -1 13157 365 3 0.11447
-5 -1 13158 365 0.3333 0.15920
-4 -1 13158 365 0.4152 0.15920
-3 -1 13158 365 0.5172 0.15920
-2 -1 13158 365 0.6443 0.15920
-1 -1 13158 365 0.8027 0.15920
0 -1 13158 365 1 0.14023
1 -1 13158 365 1.2457 0.11923
2 -1 13158 365 1.5518 0.11923
3 -1 13158 365 1.9331 0.11923
4 -1 13158 365 2.4082 0.11923
5 -1 13158 365 3 0.11923
The want dataset was generated with the following code. optday is a dataset that has only one by-group of the have dataset (I use proc sql and macro loops to "get there"). contribution and option price are the other fields I need.
data _null_;
set optday end=last;
if _n_ = 1 then do;
call symput("first1",mnes);
call symput("first1_v",iv);
end;
if last then do;
call symput("last1",mnes);
call symput("last1_v",iv);
end;
run;
proc sql noprint;
select mnes into:mneslist
separated by ','
from optday;
select iv into:IVlist
separated by ','
from optday;
select count(*) into:countlist
from optday;
quit;
data want;
do grid=-500 to 500;
membid = &currmembid;
date = "&currdate"d;
days = &currdays;
mnes = (&u) ** grid;
if mnes < &first1 then IV = &first1_v;
if mnes > &last1 then IV = &last1_v;
if mnes >= &first1 and mnes <= &last1 then IV = msplint(mnes, &countlist, &mneslist, &IVlist);
if mnes < 1 then optionprice = BLKSHPTPRC(mnes,&currdays/365,1,&currrate/100,iv);
else optionprice = BLKSHCLPRC(mnes,&currdays/365,1,&currrate/100,iv);
contrib = (1-(log(1+&k)/(&m))*grid)*optionprice/(&u)**grid;
output;
end;
run;
Note that the grid mechanism used in this post was taken from the MATLAB code publicly made available by Buss and Vilkov in Vilkov's website under Model-Free Implied Measures from Options Data: http://www.vilkov.net/www/content/code-and-useful-stuff