3
votes

My colleague and I have two datasets, where each has 1 observation per day, but the days are not sequential within each dataset and are not consistent between the two datasets.

We convert each to a zoo object and merge, eliminating from the time index those days on which there were observations in the second dataset but not in the first and in the opposite situation, filling missing observations in the second with previous observations (using locf)

When we then run a regression, we do not get the expected one coeficient per independent variable. Because we're new to zoo (and I'm still new to R), we've found related topics on SO here: Error when doing linear regression using zoo objects ... Error in `$<-.zoo`(`*tmp*` and here: regressions with xts in R, but nothing seems to be the same, though we tried various suggestions made in each including using the dyn package.

We continue to get unrecognizable results when we run summary(regression_model). Our primary question is how we get a regression with the typical summary statistics. A secondary question is whether each independent variable in the regression must be expressed as a lag (with k=0), as seems to be suggested by the dyn documentation.

Example drawn from out much larger two sets of data follows:

price data for stock XYZ arranged by date

Date <- c("1/2/13", "1/3/13", "1/4/13", "1/7/13", "1/8/13", "1/9/13", "1/10/13", "1/11/13", "1/14/13", "1/15/13")
XYZ <- c(65.73, 66.85, 66.92, 66.60, 66.07, 65.90, 66.06, 66.11, 65.12, 65.06)

Nx data arranged by date, but dates are slightly different than stock price data

N.Date <- c("1/2/13", "1/3/13", "1/4/13", "1/6/13", "1/7/13", "1/8/13", "1/10/13", "1/11/13", "1/12/13", "1/14/13")
ACR <- c(50.2, NA, 35.2, 67.9, NA, NA, 42.5, 45.1, 34.0, 61.9)
BCR <- c(14.3, NA, 16.5, 22.1, NA, NA, 18.4, 24.2, 19.8, 15.4)
CCR <- c(00.0, NA, 33.6, 41.2, NA, NA, 25.6, 00.0, 11.3, 32.0)

create data.frames

stock <- data.frame (Date, XYZ)
Nx <- data.frame (N.Date, ACR, BCR, CCR)

create zoo objects from data.frames

z.stock <- zoo(stock, as.Date(stock[, 1], format = "%m/%d/%y"))
z.Nx <- zoo(Nx, as.Date(Nx[, 1], format = "%m/%d/%y"))

merge zoo objects, eliminating any missing data from the Nx zoo object

z.merge <- merge(z.stock, z.Nx, all = c(TRUE, FALSE))

replace missing data with last observation carried forward

nmd <- na.locf(z.merge, maxgap = Inf)

run regression

mdl <- dyn$lm(nmd$XYZ ~ lag(nmd$XYZ, -1) + lag(nmd$ACR, 0) + lag(nmd$BCR, 0) +lag(nmd$CCR, 0)))
summary(mdl)
1

1 Answers

2
votes

The code in the question is placing the dates both in the data and in the index rather than just in the index.

Try this:

fmt <- "%m/%d/%y"
z.stock <- zoo(cbind(XYZ), as.Date(Date, fmt))
z.Nx <- zoo(cbind(ACR, BCR, CCR), as.Date(N.Date, fmt))

z.merge <- merge(z.stock, z.Nx, all = c(TRUE, FALSE))
nmd <- na.locf(z.merge)

dyn$lm(XYZ ~ lag(XYZ, -1) + ACR + BCR + CCR, nmd)