I have a dataset mydat
with the following variables:
MNES IV
0.84 0.40
0.89 0.34
0.91 0.31
0.93 0.29
0.95 0.26
0.98 0.23
0.99 0.22
1.00 0.22
1.02 0.20
1.04 0.18
1.07 0.18
And I need to fit cubic splines to these elements, where MNES
is the object (X) and IV
is the image (Y).
I have successfully accomplished what I need through PROC IML but I am afraid this is not the most efficient solution.
Specifically, my intended output dataset is:
mnes iv
0.333 0.40
0.332 0.40 <- for mnes out of sample MNES range, copy first IV;
0.336 0.40
... ...
0.834 0.40
0.837 0.40
0.840 0.40
0.842 INTERPOLATION
0.845 INTERPOLATION
0.848 INTERPOLATION
...
1.066 INTERPOLATION
1.069 INTERPOLATION
1.072 INTERPOLATION
1.074 0.18
1.077 0.18 <- for mnes out of sample MNES range, copy last IV;
1.080 0.18
... ...
3.000 0.18
The necessary specifics are the following:
- I always have 1001 points for
MNES
, ranging from 0.(3) to 3 (thus, each step is (3-1/3)/1000). - The interpolation for
IV
should only be used for the points between the minimum and maximumMNES
. - For the points where
MNES
is greater than the maximumMNES
in the sample,IV
should be equal to theIV
of the maximumMNES
and likewise for the minimumMNES
(it is always sorted byMNES
).
My worry for efficiency is due to the fact that I have to solve this problem roughly 2 million times and right now it (the code below, using PROC IML) takes roughly 5 hours for 100k different input datasets.
My question is: What alternatives do I have if I wish to fit cubic splines given an input data set such as the one above and output it to a specific grid of objects? And what solution would be the most efficient?
- With PROC IML I can do exactly this with the splinev function, but I am concerned that using PROC IML is not the most efficient way;
- With PROC EXPAND, given that this is not a time series, it does not seem adequate. Additionally, I do not know how to specify the grid of objects which I need through PROC EXPAND;
- With PROC TRANSREG, I do not understand how to input a dataset into the knots and I do not understand whether it will output a dataset with the corresponding interpolation;
- With the MSPLINT function, it seems doable but I do not know how to input a data set to its arguments.
I have attached the code I am using below for this purpose and an explanation of what I am doing. Reading what is below is not necessary for answering the question but it could be useful for someone solving this sort of problem with PROC IML or wanting a better understanding of what I am saying.
I am replicating a methodology (Buss and Vilkov (2012)) which, among other things, applies cubic splines to these elements, where MNES
is the object (X) and IV
is the image (Y).
The following code is heavily based on the Model Free Implied Volatility (MFIV) MATLAB code by Vilkov for Buss and Vilkov (2012), available on his website.
The interpolation is a means to calculate a figure for stock return volatility under the risk-neutral measure, by computing OTM put and call prices. I am using this for the purpose of my master thesis. Additionally, since my version of PROC IML does not have functions for Black-Scholes option pricing, I defined my own.
proc iml;
* Define BlackScholes call and put function;
* Built-in not available in SAS/IML 9.3;
* Reference http://www.lexjansen.com/wuss/1999/WUSS99039.pdf ;
start blackcall(x,t,s,r,v,d);
d1 = (log(s/x) + ((r-d) + 0.5#(v##2)) # t) / (v # sqrt(t));
d2 = d1 - v # sqrt(t);
bcall = s # exp(-d*t) # probnorm(d1) - x # exp(-r*t) # probnorm(d2);
return (bcall);
finish blackcall;
start blackput(x,t,s,r,v,d);
d1 = (log(s/x) + ((r-d) + 0.5#(v##2)) # t) / (v # sqrt(t));
d2 = d1 - v # sqrt(t);
bput = -s # exp(-d*t) # probnorm(-d1) + x # exp(-r*t) # probnorm(-d2);
return (bput);
finish blackput;
store module=(blackcall blackput);
quit;
proc iml;
* Specify necessary input parameters;
currdate = "&currdate"d;
currpermno = &currpermno;
currsecid = &currsecid;
rate = &currrate / 100;
mat = &currdays / 365;
* Use inputed dataset and convert to matrix;
use optday;
read all var{mnes impl_volatility};
mydata = mnes || impl_volatility;
* Load BlackScholes call and Put function;
load module=(blackcall blackput);
* Define parameters;
k = 2;
m = 500;
* Define auxiliary variables according to Buss and Vilkov;
u = (1+k)##(1/m);
a = 2 * (u-1);
* Define moneyness (ki) and implied volatility (vi) grids;
mi = (-m:m);
mi = mi`;
ki = u##mi;
* Preallocation of vi with 2*m+1 ones (1001 in the base case);
vi = J(2*m+1,1,1);
* Define IV below minimum MNESS equal to the IV of the minimum MNESS;
h = loc(ki<=mydata[1,1]);
vi[h,1] = mydata[1,2];
* Define IV above maximum MNESS equal to the IV of the maximum MNESS;
h = loc(ki>=mydata[nrow(mydata),1]);
vi[h,1] = mydata[nrow(mydata),2];
* Define MNES grid where there are IV from data;
* (equal to where ki still has ones resulting from the preallocation);
grid = ki[loc(vi=1),];
* Call splinec to interpolate based on available data and obtain coefficients;
* Use coefficients to create spline on grid and save on smoothFit;
* Save smoothFit in correct vi elements;
call splinec(fitted,coeff,endSlopes,mydata);
smoothFit = splinev(coeff,grid);
vi[loc(vi=1),1] = smoothFit[,2];
* Define elements of mi corresponding to OTM calls (MNES >=1) and OTM puts (MNES <1);
ic = mi[loc(ki>=1)];
ip = mi[loc(ki<1)];
* Calculate call and put prices based on call and put module;
calls = blackcall(ki[loc(ki>=1),1],mat,1,rate,vi[loc(ki>=1),1],0);
puts = blackput(ki[loc(ki<1),1],mat,1,rate,vi[loc(ki<1),1],0);
* Complete volatility calculation based on Buss and Vilkov;
b1 = sum((1-(log(1+k)/m)#ic)#calls/u##ic);
b2 = sum((1-(log(1+k)/m)#ip)#puts/u##ip);
stddev = sqrt(a*(b1+b2)/mat);
* Append to voldata dataset;
edit voldata;
append var{currdate currsecid currpermno mat stddev};
close voldata;
quit;