7
votes

I have to xts/zoo objects. each has measures of different variables over a different time span. I want to create a single time series comprising all measures at all times, with NAs for missing dates/variable combinations. how do I do that? artificial example:

library(xts)
x<-cbind(a=1:3,b=3:1)
rownames(x) = as.character(Sys.Date()-1:3)

y<-cbind(a=5:7,c=3:1)
rownames(y) = as.character(Sys.Date()-5:7)

xs=as.xts(x)
ys=as.xts(y)

#now what?

#desired outcome looks like:
            a  b   c
2013-03-10  7 NA   1
2013-03-11  6 NA   2
2013-03-12  5 NA   3
2013-03-14  3  1  NA
2013-03-15  2  2  NA
2013-03-16  1  3  NA

# regular merge looks like that (adding an a.1 variable)
merge(xs,ys)
            a  b a.1  c
2013-03-10 NA NA   7  1
2013-03-11 NA NA   6  2
2013-03-12 NA NA   5  3
2013-03-14  3  1  NA NA
2013-03-15  2  2  NA NA
2013-03-16  1  3  NA NA

# simple concatenation ignores variable names and looks like that
c(xs,ys)
           a b
2013-03-10 7 1
2013-03-11 6 2
2013-03-12 5 3
2013-03-14 3 1
2013-03-15 2 2
2013-03-16 1 3

# so what should I do?
4
what happens if xs and ys both have value for a?CHP
for my purposes its ok if xs "wins", and the relevant data point in ys is thrown/ignored. I'm open to other behaviors as well.amit
xts has a mergefunction that does inner, outer, left and right merges. But your problem is not specific to any of them. you'll have to write your own function.Arun

4 Answers

4
votes

This is not a general solution. But works for this example:

cbind(rbind(xs[,1],ys[,1]), cbind(xs[,-1],ys[,-1]))
           a  b  c
2013-03-10 7 NA  1
2013-03-11 6 NA  2
2013-03-12 5 NA  3
2013-03-14 3  1 NA
2013-03-15 2  2 NA
2013-03-16 1  3 NA

Just remind that cbind.xts is just merge.xts. S you can get the same result using merge

merge(rbind(xs[,1],ys[,1]), merge(xs[,-1],ys[,-1]))
           a  b  c
2013-03-10 7 NA  1
2013-03-11 6 NA  2
2013-03-12 5 NA  3
2013-03-14 3  1 NA
2013-03-15 2  2 NA
2013-03-16 1  3 NA

The problem with this solution is that if ys and xs have some dates incommon, you will have duplicated index in your final xts object.For example, if we replace y :

rownames(y) = as.character(Sys.Date()-3:5)

You get , a duplicated index for 2013-03-14, So im anot sure that it s a valid xts object.

merge(rbind(xs[,1],ys[,1]), merge(xs[,-1],ys[,-1]))
           a  b  c
2013-03-12 7 NA  1
2013-03-13 6 NA  2
2013-03-14 3  1  3
2013-03-14 5 NA NA
2013-03-15 2  2 NA
2013-03-16 1  3 NA

EDIT a generalization of the solution:

inter <- intersect(names(ys), names(xs))
diffx <- setdiff(names(xs),inter)
diffy <- setdiff(names(ys),inter)

merge(rbind(xs[,inter],ys[,inter]), merge(xs[,diffx],ys[,diffy]))


           a  b  c
2013-03-10 7 NA  1
2013-03-11 6 NA  2
2013-03-12 5 NA  3
2013-03-14 3  1 NA
2013-03-15 2  2 NA
2013-03-16 1  3 NA
2
votes

What you want => merge(data.frame(x,d),data.frame(y,d),by=c("d","a"),all=T)

You should use data.frame not name vectors/matrix, here is a general solution, what you want is just a one liner with a full outer join (look at ?merge)

x<-cbind(a=1:3,b=3:1)
d= as.character(Sys.Date()-1:3)
DT1 = data.frame(x,d)
#DT1
#   a b          d
#1: 1 3 2013-03-16
#2: 2 2 2013-03-15
#3: 3 1 2013-03-14

y<-cbind(a=5:7,c=3:1)
d = as.character(Sys.Date()-5:7)
DT2 = data.frame(y,d)
#DT2
#   a b          d
#1: 1 3 2013-03-12
#2: 2 2 2013-03-11
#3: 3 1 2013-03-10
merge(DT1,DT2,by=c("d","a"),all=T)
#           d a  b  c
#1 2013-03-10 7 NA  1
#2 2013-03-11 6 NA  2
#3 2013-03-12 5 NA  3
#4 2013-03-14 3  1 NA
#5 2013-03-15 2  2 NA
#6 2013-03-16 1  3 NA
1
votes

ok. spent some time thinking on this. because eventually I need to "merge" many such dataframes/xts into one, and not just merging two of them, I thought that it makes sense to do all this in one step: create a big matrix of all date/var combinations. then plugging into this big matrix all the observed data, object by object. the code looks as follows (will be happy to get comments on it, and feel free to use, without any kind of warranty, of course):

alltogether = function(dlist) {
    all.vars = unique(unlist(lapply(dlist,colnames)))
    all.obs = unique(unlist(lapply(dlist,rownames)))    
    res = array(NA,dim=c(length(all.obs),length(all.vars)),
          dimnames=list(all.obs,all.vars))
    for(d in dlist) {
            res[rownames(d),colnames(d)]=d
    }
    return(res)
}

alltogether.xts = function(xlist) {
    dlist = lapply(xlist,as.matrix)
    res = alltogether(dlist)
    xres = as.xts(res)
    return(xres)
}
0
votes

I would say to convert it to an numerical array (as.numeric(ts)), concatenate it with cbind(ts1,ts2) and then go back to the time series, ts(c(as.numeric(ts1),as.numeric(ts2))