12
votes

I am trying to fit vector autoregressive (VAR) models using the generalized linear model fitting methods included in scikit-learn. The linear model has the form y = X w, but the system matrix X has a very peculiar structure: it is block-diagonal, and all blocks are identical. To optimize performance and memory consumption the model can be expressed as Y = BW, where B is a block from X, and Y and W are now matrices instead of vectors. The classes LinearRegression, Ridge, RidgeCV, Lasso, and ElasticNet readily accept the latter model structure. However, fitting LassoCV or ElasticNetCV fails due to Y being two-dimensional.

I found https://github.com/scikit-learn/scikit-learn/issues/2402 From this discussion I assume that the behavior of LassoCV/ElasticNetCV is intended. Is there a way to optimize the alpha/rho parameters other than manually implementing cross-validation?

Furthermore, Bayesian regression techniques in scikit-learn also expect y to be one-dimensional. Is there any way around this?

Note: I use scikit-learn 0.14 (stable)

2
Why are you using regression models for auto-regressive process? What the actual nature of your system: Y_t=F(Y_{t-1}), Y_t=F(Y_{t-1}, X_t) or Y_t=F(X_t)? - Andrey Shokhin
I forgot to mention that the AR process is linear with additive noise. So I suppose the nature of the system would be Y_t=F(Y_{t-1}, X_t), where F() is a linear function and X_t is white noise. - MB-F
Very good suggestion. Statsmodel has all the functionality for one's everyday VAR needs. Unfortunately there are reasons why I cannot use it: (1) I want to avoid the additional dependency. (2) I need to support regularized and sparse estimators which are available in scikit-learn. - MB-F

2 Answers

3
votes

How crucial is the performance and memory optimization gained by using this formulation of the regression? Given that your reformulation breaks scikit-learn, I wouldn't really call it an optimization... I would suggest:

  1. Running the unoptimized version and waiting (if possible).

  2. Git pull the following code, which supposedly solves your problem. It's referenced in the conversation you posted from the scikit-learn github project. See here for instructions on building scikit-learn from a git pull. You can then add the branched scikit-learn location to your python path and execute your regression using the modified library code. Be sure to post your experiences and any issues you encounter; I'm sure the scikit developers would appreciate it.