1
votes

I am trying to create a set of rolling covariance matrices on financial data (window size = 60). Returns is a 125x3 df.

import pandas as pd

roll_rets = returns.rolling(window=60)
Omega = roll_rets.cov()

Omega is a 375x3 data frame with what looks like a multi-index - i.e. there are 3 values for each timestamp.

What I actually want this to return is a set of 66 3x3 covariance matrices (i.e. one for each period), but I can't work out how to iterate over returns correctly to do this. I think I'm missing something obvious. Thanks.

1
Question 1: Re: "a set of 125 3x3 covariance matrices": Using a rolling window of length 60 on 125 observations will give you 66 3x3 windows. Is this what you mean? Question 2: what output do you want your data in? NumPy array? In its current form, the MultiIndex DataFrame is a set of 66 3x3 DataFrames, each of which is a covariance matrix.Brad Solomon
On 1, yes that's what I mean and will edit to reflect. On 2 I'd ideally like a dictionary of covariance matrices, or something I can iterate over easily. The covariance matrices are just one Parameter in a model so I need to multiply them against vectors etc later in the code.Johnnyh101

1 Answers

2
votes

Firstly: a MultiIndex DataFrame is an iterable object. (Try bool(pd.DataFrame.__iter__). There are several StackOverflow questions on iterating through the sub-frames of a MultiIndex DataFrame, if you have interest.

But to your question directly, here is a dict: the keys are the (end) dates, and each value is a 3x3 NumPy array.

import pandas as pd
import numpy as np

Omega = (pd.DataFrame(np.random.randn(125,3), 
                      index=pd.date_range('1/1/2010', periods=125),
                      columns=list('abc'))
         .rolling(60)
         .cov()
         .dropna()) # this will get you to 66 windows instead of 125 with NaNs

dates = Omega.index.get_level_values(0) # or just the index of your base returns
d = dict(zip(dates, [Omega.loc[date].values for date in dates]))

Is this efficient? No, not very. You are creating a separate NumPy array for each value of the dict. Each NumPy array has its own dtype, etc. The DataFrame as it is now is arguably well-suited for your purpose. But one other solution is to create a single NumPy array by expanding the ndim of Omega.values:

Omega.values.reshape(66, 3, 3)

Here each element is a matrix (again, easily iterable, but loses the date indexing that you had in your DataFrame).

Omega.values.reshape(66, 3, 3)[-1] # last matrix/final date
Out[29]: 
array([[ 0.80865977, -0.06134767,  0.04522074],
       [-0.06134767,  0.67492558, -0.12337773],
       [ 0.04522074, -0.12337773,  0.72340524]])