R: Time Series Regression with NA and multiple dependent variables

Question

I would like to run a time series regression with a list of dependent variables as the column. I would like to regress each column on a set of independent variables. I know you can just use

lm(dataframe~independent variables)

because if the dependent variable is a matrix, then they will just go through each column.

However, my dependent variables are information about stocks through time and sometimes information is not available for every single stock at every time point, so I have some NA values. The problem that I am having is that if I use lm, I have to omit the NA values, i.e. the lm function removes the whole row when running the regression. This is fine if I only want to run a regression on one dependent variable, but I have a list(1000+) of dependent variables which I would like to run my regression on. Because my dataset is only 15+ years, there is are missing values for very single time point, so when I run my lm regression, I get an error because the lm function has removed every single row when running the regression. The only way that I can think of to solve this problem is to run a for loop and run a separate regression for each stock, which I think will take a very long time to compute. For example, the following is an example of my data:

              135081(P)   135084(P)    135090(P)   
1994-12-30           NA          NA           NA         
1995-01-02           NA          NA           NA          
1995-01-03     06864935          NA           NA        
1995-01-04           NA          NA  -0.05474644         
1995-01-05           NA          NA   0.20894900          
1995-01-06           NA -0.45672832  -0.02378632

so if I run a time series regression on this, I would get an error because the lm function would skip every single row.

So my question is, would there be another way to run a time series regression across a data frame with different DEPENDENT variables where the regression "skips" the NA for just the one particular dependent variable instead of skipping it for every other dependent variable as well?

I don't think using na.omit is correct because it removes the time series properties of my dataset and using na.action=NULL doesn't work because I have NA in my dataset. Thank you a lot for your help.

Hi, if I use na.action=NULL, I get the following error: " Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : NA/NaN/Inf in foreign function call (arg 4) — user2672759
I don't think there is a way to do it other than going column by column. Have you tried reducing the # of factors, e.g., through PCA? — Doctor Dan
you can exclude each NA in the columns using apply(dataframe, 2, function(x) x[!is.na(x)]). This will return a list with the non NA values for each columns, but than you must also index your independent variables according to the dependent.... — holzben
Are you sure this is the right model for these data? I don't usually run time series analyses, but when I did, I got in trouble for using lm and ended up working with a consultant to run a Mann-Kendall test, for which there is now a package ?Kendall. — Nazer

Alexis Alexis · Accepted Answer · 2014-01-17T17:04:58

You might want to employ a multiple imputation method using something like the Amelia 2 package on CRAN in order to properly account for increased uncertainty in your estimates due to missingness, and also to help minimize biases that result from case-wise deletion. See for example:

Honaker, J. and King, G. (2010). What to do about missing values in time-series cross-section data. American Journal of Political Science, 54(2):561–581.

R: Time Series Regression with NA and multiple dependent variables

1 Answers