4
votes

This is a follow up question to another question of mine, Efficiently fitting cubic splines in SAS to specific grid of objects - the previous solution worked, but was not nearly as efficient as I imagine they can be.

The purpose of this code is to interpolate values of an image IV for a specific range of an object variable MNES, by groups, and perform operations on the resulting values.

My data-set is as follows:

MEMBID    DATE  MNES     IV
     1   13152  0.84   0.40
     1   13152  0.89   0.34
     1   13152  0.91   0.31
     1   13152  0.93   0.29
     1   13152  0.95   0.26
     1   13152  0.98   0.23
     1   13152  0.99   0.22
     1   13152  1.00   0.22
     1   13152  1.02   0.20
     1   13152  1.04   0.18
     1   13152  1.07   0.18
     ...
     4   13172  0.89   0.24
     4   13172  0.94   0.20
     4   13172  0.97   0.19
     4   13172  1.01   0.19
     4   13172  1.04   0.20
     4   13172  1.07   0.21
     4   13172  1.15   0.23
     4   13172  1.17   0.24
     ...

Each MEMBID and DATE combination is a by-group. For perspective, I should have roughly two million by-groups.

Each of these by-groups should be used to estimate IV for a grid of 1001 MNES, through fitting a cubic splines to the existing IV-MNES pairs. This grid goes from 1/3 to 3 and is not linear, but rather estimated as follows (this has to do with the methodology I am applying - graphically):

%let m = 500;
%let k = 2;
%let u = (1+&k) ** (1/&m);
data splined(keep=mnes);
    do grid=-500 to 500;
        mnes = (&u) ** grid;
    end;
run;

Each of the by groups are then synthesized to a single figure, by calculating prices of options for each row and summing a certain function of that by group. Given the volume of the resulting data set (i.e. 1001 grid objects x 2 million by groups = 2 billion rows), I do not know if it is efficient to do to this all at once - but looping also does not seem smart.

I have researched this time and again, and from what I gather, there are several built-in tools to estimate cubic splines in SAS:

  • BASE SAS has this implemented through the MSPLINT function. While I have implemented this (see previous post, optional) it requires me to use a great amount of macro loops, as it requires me to individually specify the MNES and IV of each by group in a macro variable, through the use of a comma separated list. Doing 2 million iterations in loops is far from a good idea - it is taking far too long in my current implementation;
  • SAS/IML allows this through the splinec and splinev functions and I have succesfully implemented this (see previous post, optional) but IML does not seem ideal to deal with 2 million by groups;
  • SAS/ETS allows cubic splines to be fit through PROC EXPAND - however, PROC EXPAND requires a time series and integer values for the ID variable. This is wholly incompatible with my data, considering MNES is decimal and needs to have that specific number of steps (cannot be approximated to an integer);
  • SAS/STAT allows cubic splines to be fit through PROC TRANSREG (this is my last resort but it also seems the most promising one), by using the available spline transformation. It even supports by group processing. It outputs very good looking plots, but I am unable to understand how I can get an output dataset for the specific grid I need. Alternatively, I would like to obtain the parameters of the spline and estimate the grid "manually" - but this seems impossible. If I only fit a third degree polinomial, then I can get the parameters (through the out= coef option), but not if I want an actual spline.

If you are unaware of how a cubic spline works, this paper explains it in quite simple terms.

I have researched this thoroughly and I do not have an answer. I appreciate any suggestion on how to automate this process efficiently. Thank you.

Following the comment by @Joe, here are have and want data sets for 5 by groups. I have simplified the grid parameters, to only have m = 5, thus 11 points per by group. (Note that I took away some decimal places in have and want to improve presentation).

have

membid       date    days     iv    mnes
    -1  04JAN1996     365   0.15    0.91
    -1  04JAN1996     365   0.14    0.94
    -1  04JAN1996     365   0.14    0.96
    -1  04JAN1996     365   0.13    0.98
    -1  04JAN1996     365   0.13    1.00
    -1  04JAN1996     365   0.13    1.02
    -1  04JAN1996     365   0.12    1.04
    -1  04JAN1996     365   0.12    1.06
    -1  04JAN1996     365   0.12    1.08
    -1  04JAN1996     365   0.11    1.09
    -1  04JAN1996     365   0.11    1.11
    -1  04JAN1996     365   0.11    1.13
    -1  05JAN1996     365   0.15    0.91
    -1  05JAN1996     365   0.14    0.94
    -1  05JAN1996     365   0.14    0.96
    -1  05JAN1996     365   0.13    0.98
    -1  05JAN1996     365   0.13    1.00
    -1  05JAN1996     365   0.13    1.02
    -1  05JAN1996     365   0.12    1.04
    -1  05JAN1996     365   0.12    1.06
    -1  05JAN1996     365   0.12    1.08
    -1  05JAN1996     365   0.11    1.09
    -1  05JAN1996     365   0.11    1.11
    -1  05JAN1996     365   0.11    1.13
    -1  08JAN1996     365   0.14    0.91
    -1  08JAN1996     365   0.14    0.94
    -1  08JAN1996     365   0.13    0.96
    -1  08JAN1996     365   0.13    0.98
    -1  08JAN1996     365   0.12    1.00
    -1  08JAN1996     365   0.12    1.02
    -1  08JAN1996     365   0.13    1.04
    -1  08JAN1996     365   0.12    1.06
    -1  08JAN1996     365   0.12    1.08
    -1  08JAN1996     365   0.12    1.10
    -1  08JAN1996     365   0.11    1.11
    -1  08JAN1996     365   0.11    1.13
    -1  09JAN1996     365   0.15    0.91
    -1  09JAN1996     365   0.14    0.94
    -1  09JAN1996     365   0.14    0.96
    -1  09JAN1996     365   0.14    0.98
    -1  09JAN1996     365   0.13    1.00
    -1  09JAN1996     365   0.13    1.02
    -1  09JAN1996     365   0.12    1.04
    -1  09JAN1996     365   0.12    1.06
    -1  09JAN1996     365   0.12    1.08
    -1  09JAN1996     365   0.11    1.09
    -1  09JAN1996     365   0.11    1.11
    -1  09JAN1996     365   0.11    1.13
    -1  10JAN1996     365   0.15    0.91
    -1  10JAN1996     365   0.15    0.94
    -1  10JAN1996     365   0.14    0.96
    -1  10JAN1996     365   0.14    0.98
    -1  10JAN1996     365   0.13    1.00
    -1  10JAN1996     365   0.13    1.02
    -1  10JAN1996     365   0.13    1.05
    -1  10JAN1996     365   0.12    1.06
    -1  10JAN1996     365   0.12    1.08
    -1  10JAN1996     365   0.12    1.10
    -1  10JAN1996     365   0.12    1.12
    -1  10JAN1996     365   0.11    1.14

want

grid    membid     date    days      mnes         iv
  -5        -1    13152     365    0.3333    0.15207 
  -4        -1    13152     365    0.4152    0.15207 
  -3        -1    13152     365    0.5172    0.15207 
  -2        -1    13152     365    0.6443    0.15207 
  -1        -1    13152     365    0.8027    0.15207 
   0        -1    13152     365    1         0.13631 
   1        -1    13152     365    1.2457    0.11354 
   2        -1    13152     365    1.5518    0.11354 
   3        -1    13152     365    1.9331    0.11354 
   4        -1    13152     365    2.4082    0.11354 
   5        -1    13152     365    3         0.11354 
  -5        -1    13153     365    0.3333    0.15217 
  -4        -1    13153     365    0.4152    0.15217 
  -3        -1    13153     365    0.5172    0.15217 
  -2        -1    13153     365    0.6443    0.15217 
  -1        -1    13153     365    0.8027    0.15217 
   0        -1    13153     365    1         0.135057
   1        -1    13153     365    1.2457    0.115   
   2        -1    13153     365    1.5518    0.115   
   3        -1    13153     365    1.9331    0.115   
   4        -1    13153     365    2.4082    0.115   
   5        -1    13153     365    3         0.115   
  -5        -1    13156     365    0.3333    0.14731 
  -4        -1    13156     365    0.4152    0.14731 
  -3        -1    13156     365    0.5172    0.14731 
  -2        -1    13156     365    0.6443    0.14731 
  -1        -1    13156     365    0.8027    0.14731 
   0        -1    13156     365    1         0.131243
   1        -1    13156     365    1.2457    0.11656
   2        -1    13156     365    1.5518    0.11656
   3        -1    13156     365    1.9331    0.11656
   4        -1    13156     365    2.4082    0.11656
   5        -1    13156     365    3         0.11656
  -5        -1    13157     365    0.3333    0.15556
  -4        -1    13157     365    0.4152    0.15556
  -3        -1    13157     365    0.5172    0.15556
  -2        -1    13157     365    0.6443    0.15556
  -1        -1    13157     365    0.8027    0.15556
   0        -1    13157     365    1         0.13881
   1        -1    13157     365    1.2457    0.11447
   2        -1    13157     365    1.5518    0.11447
   3        -1    13157     365    1.9331    0.11447
   4        -1    13157     365    2.4082    0.11447
   5        -1    13157     365    3         0.11447
  -5        -1    13158     365    0.3333    0.15920
  -4        -1    13158     365    0.4152    0.15920
  -3        -1    13158     365    0.5172    0.15920
  -2        -1    13158     365    0.6443    0.15920
  -1        -1    13158     365    0.8027    0.15920
   0        -1    13158     365    1         0.14023
   1        -1    13158     365    1.2457    0.11923
   2        -1    13158     365    1.5518    0.11923
   3        -1    13158     365    1.9331    0.11923
   4        -1    13158     365    2.4082    0.11923
   5        -1    13158     365    3         0.11923

The want dataset was generated with the following code. optday is a dataset that has only one by-group of the have dataset (I use proc sql and macro loops to "get there"). contribution and option price are the other fields I need.

data _null_;
    set optday end=last;
    if _n_ = 1 then do;
        call symput("first1",mnes);
        call symput("first1_v",iv);
    end;
    if last then do;
        call symput("last1",mnes);
        call symput("last1_v",iv);
    end;
run;

proc sql noprint;
    select mnes into:mneslist
    separated by ','
    from optday;
    select iv into:IVlist
    separated by ','
    from optday;
    select count(*) into:countlist
    from optday;
quit;

data want;
    do grid=-500 to 500;
        membid = &currmembid;
        date = "&currdate"d;
        days = &currdays;
        mnes = (&u) ** grid;
        if mnes < &first1 then IV = &first1_v;
        if mnes > &last1 then IV = &last1_v;
        if mnes >= &first1 and mnes <= &last1 then IV = msplint(mnes, &countlist, &mneslist, &IVlist);
        if mnes < 1 then optionprice = BLKSHPTPRC(mnes,&currdays/365,1,&currrate/100,iv);
        else optionprice = BLKSHCLPRC(mnes,&currdays/365,1,&currrate/100,iv);
        contrib = (1-(log(1+&k)/(&m))*grid)*optionprice/(&u)**grid;
        output;
    end;
run;

Note that the grid mechanism used in this post was taken from the MATLAB code publicly made available by Buss and Vilkov in Vilkov's website under Model-Free Implied Measures from Options Data: http://www.vilkov.net/www/content/code-and-useful-stuff

1
My feeling is that the data step solution is probably workable into a by group method. Could you post a 'have' and 'want' dataset for, say, 5 by groups - 'have' being the initial dataset and 'want' being what the splines should be for those 5 by groups (or whatever is appropriate)? Use whatever method you want to produce that, IML or data step or whatever. - Joe
Hey @Joe. Thank you for your help. I have the have and want data sets but the want is far too big. Should I attach it or just redact parts of it? I am unaware if attaching is proper etiquette. - sasstudent
Can you simplify the question further - say instead of -500 to 500 do -5 to 5 or something? It would be best if you can simplify it in some fashion that doesn't hurt the overall problem, but can still be pasted in the question. - Joe
That was a good suggestion, thank you. I have added the want and have datasets. - sasstudent
I'm working on something promising. Appending the 1001 mnes observation data set, with IV as mising, with the mnes and iv observations and then running proc transreg on that. It interpolates the missing values. This would support by group processing. I'll post it if I am successful. Still, I appreciate any help. Thank you. - sasstudent

1 Answers

0
votes

I was able to solve this issue and am therefore sharing my solution. I believe it is quite efficient - it can run all the process I need in 3 to 4 hours. The previous estimate I had was of 10 days.

I follow these steps:

  1. Sort have by membid, dates, and days;
  2. Create mnes grid, giving missing value to iv;
  3. Append observations to that grid (this step may be inefficient but I do not know how to manipulate the datastep to do this in only one step);
  4. Sort information again by groups;
  5. Apply PROC TRANSREG by groups - it assumes constant values outside the range of known mnes and interpolates for the remaining missing ones.
  6. Take out values which are the observed, so that only the grid remains.

Shortcomings of this: I cannot specify dynamically that I want one spline knot at each observed value. Consequently, I have to give a certain number of knots and it will space them equally between the observed values. This makes it so that the values outside the known mnes range are not the final iv in the grid, but rather another value.

Here is the code I used, with example information:

data have;
    input membid date days iv mnes;
    datalines;
    1 1 365 0.15 0.91
    1 1 365 0.14 0.94
    1 1 365 0.14 0.96
    1 1 365 0.13 0.98
    1 1 365 0.13 1.00
    1 1 365 0.13 1.02
    1 1 365 0.12 1.04
    1 1 365 0.12 1.06
    1 1 365 0.12 1.08
    1 1 365 0.11 1.09
    1 1 365 0.11 1.11
    1 1 365 0.11 1.13
    2 2 365 0.15 0.91
    2 2 365 0.14 0.94
    2 2 365 0.14 0.96
    2 2 365 0.13 0.98
    2 2 365 0.13 1.00
    2 2 365 0.13 1.02
    2 2 365 0.12 1.04
    2 2 365 0.12 1.06
    2 2 365 0.12 1.08
    2 2 365 0.11 1.09
    2 2 365 0.11 1.11
    2 2 365 0.11 1.13
     ;
run;

* Set model inputs;
%let m = 500;
%let k = 2;
%let u = (1+&k) ** (1/&m);
%let a = 2 * (&u-1);

* Needed to sort data by groups;
proc sort data=have;
    by membid date days mnes;
run;

* Create moneyness grid for each by group;
data _splined;
    set have;
    by membid date days;
    if first.days then do grid=(-&m) to &m;
        mnes = (&u) ** grid;
        iv = .;
        output;
    end;
run;

* Append observations to generated moneyness grid;
proc datasets nolist;
    append base=_splined
    data=have;
quit;

* Sort these, because first come the moneyness grids and then all the observations;
proc sort data=_splined;
    by membid date days mnes;
run;

* Apply cubic splines with 10 equally spaced knots;
proc transreg data=_splined short noprint;
    model identity(iv)=spline(mnes / nknots=10);
    output out=output1 p;
    by membid date days;
run;

* Delete initial observations - these are marked by proc transreg with _type_ = 'SCORE';
data splined(keep=membid date days piv mnes);
    set output1(keep=membid date days _type_ piv mnes);
    if _type_ NE 'SCORE';
run;

If you have any questions, please comment and I will try to help!

Note that the grid mechanism used in this post was taken from the MATLAB code publicly made available by Buss and Vilkov in Vilkov's website under Model-Free Implied Measures from Options Data: http://www.vilkov.net/www/content/code-and-useful-stuff