My colleague and I have two datasets, where each has 1 observation per day, but the days are not sequential within each dataset and are not consistent between the two datasets.
We convert each to a zoo object and merge, eliminating from the time index those days on which there were observations in the second dataset but not in the first and in the opposite situation, filling missing observations in the second with previous observations (using locf)
When we then run a regression, we do not get the expected one coeficient per independent variable. Because we're new to zoo (and I'm still new to R), we've found related topics on SO here: Error when doing linear regression using zoo objects ... Error in `$<-.zoo`(`*tmp*` and here: regressions with xts in R, but nothing seems to be the same, though we tried various suggestions made in each including using the dyn package.
We continue to get unrecognizable results when we run summary(regression_model). Our primary question is how we get a regression with the typical summary statistics. A secondary question is whether each independent variable in the regression must be expressed as a lag (with k=0), as seems to be suggested by the dyn documentation.
Example drawn from out much larger two sets of data follows:
price data for stock XYZ arranged by date
Date <- c("1/2/13", "1/3/13", "1/4/13", "1/7/13", "1/8/13", "1/9/13", "1/10/13", "1/11/13", "1/14/13", "1/15/13")
XYZ <- c(65.73, 66.85, 66.92, 66.60, 66.07, 65.90, 66.06, 66.11, 65.12, 65.06)
Nx data arranged by date, but dates are slightly different than stock price data
N.Date <- c("1/2/13", "1/3/13", "1/4/13", "1/6/13", "1/7/13", "1/8/13", "1/10/13", "1/11/13", "1/12/13", "1/14/13")
ACR <- c(50.2, NA, 35.2, 67.9, NA, NA, 42.5, 45.1, 34.0, 61.9)
BCR <- c(14.3, NA, 16.5, 22.1, NA, NA, 18.4, 24.2, 19.8, 15.4)
CCR <- c(00.0, NA, 33.6, 41.2, NA, NA, 25.6, 00.0, 11.3, 32.0)
create data.frames
stock <- data.frame (Date, XYZ)
Nx <- data.frame (N.Date, ACR, BCR, CCR)
create zoo objects from data.frames
z.stock <- zoo(stock, as.Date(stock[, 1], format = "%m/%d/%y"))
z.Nx <- zoo(Nx, as.Date(Nx[, 1], format = "%m/%d/%y"))
merge zoo objects, eliminating any missing data from the Nx zoo object
z.merge <- merge(z.stock, z.Nx, all = c(TRUE, FALSE))
replace missing data with last observation carried forward
nmd <- na.locf(z.merge, maxgap = Inf)
run regression
mdl <- dyn$lm(nmd$XYZ ~ lag(nmd$XYZ, -1) + lag(nmd$ACR, 0) + lag(nmd$BCR, 0) +lag(nmd$CCR, 0)))
summary(mdl)