3
votes

I am not sure why I am getting <NA> in the index when I use MATCH with a zoo object. Suppose I have the following:

a <- read.zoo(data.frame(date=as.Date('2011-12-31') + 0:49, col1=seq(1,50), col2=seq(11,60)), FUN = as.Date)
mon <- read.zoo(data.frame(date=c(as.Date('2012-01-01'), as.Date('2012-02-01'), as.Date('2012-03-01')), mc=letters[1:3], mc2=LETTERS[1:3]), FUN = as.Date)

Then I try to match:

mon$matched <- a[MATCH(index(mon),index(a))]$col1

Then I tried to view what mon now looks like and get an error:

View(mon)
Error in View : missing values in 'row.names' are not allowed

Looking at mon further I am not sure where the extra <NA> row came from:

mon
           mc   mc2  matched
2012-01-01 a    A    2      
2012-02-01 b    B    33     
2012-03-01 c    C    <NA>   
<NA>       <NA> <NA> <NA>   

What is the proper way to do this match? The result is correct except for that last row where all values are <NA>. I must be doing something fundamentally wrong here...

2
So basicly left_join a to mon? if they were xts objects, merge.xts(mon, a, join = "left"). Not sure why you get the na's. Maybe @G. Grothendieck will drop by on this question and he might know why you get the NA's.phiver
Missing comma after last closing paren in a[MATCH(index(mon),index(a))]$col1 and there was no match for the last date in a hence the NAIRTFM
@42- The comma didn't change anything. The no match for the last date is fine. I am specifically trying to figure out why the last row has NA for the index and all data.Denis

2 Answers

2
votes

It seems you are trying to create a left join. For that one normally uses merge. The two elements of the argument all = c(TRUE, FALSE) shown in the code below refer to whether we keep unmatched dates in mon and a respectively.
library(zoo)

a <- zoo(cbind(col1 = 1:50, col2 = 11:60), as.Date("2011-12-31") + 0:49)
mon <- zoo(cbind(mc = letters[1:3], mc2 = LETTERS[1:3]), 
           as.Date(c('2012-01-01', '2012-02-01', '2012-03-01')))

merge(mon, a, all = c(TRUE, FALSE))

giving:

           mc mc2 col1 col2
2012-01-01 a  A   2    12  
2012-02-01 b  B   33   43  
2012-03-01 c  C   <NA> <NA>

If you only want col1 then:

merge(mon, a, all = c(TRUE, FALSE))$col1

If you don't need the row with the NA then specify FALSE to eliminate unmatched dates from both mon and a:

merge(mon, a, all = FALSE)

INDEXING BY TIME

This can also be done by using time indexing like this;

result <- mon
result$col1 <- a$col1[time(mon)]  # does an implicit merge
result

giving:

           mc mc2 col1
2012-01-01 a  A   2   
2012-02-01 b  B   33  
2012-03-01 c  C   <NA>

If you don't need the NA row then this would be sufficient:

a[time(mon)]

giving:

           col1 col2
2012-01-01    2   12
2012-02-01   33   43

MATCH

1) Although the above approaches are recommended over MATCH if you do want to use MATCH for some reason then add the nomatch = 0 argument so that it returns 0 instead of NA for non-matches. That will cause the indexing to simply drop that value. The assignment to result$col1 will do an implicit merge filling in an NA.

result <- mon
result$col1 <- a$col1[MATCH(time(mon), time(a), nomatch = 0)]
result

giving:

           mc mc2 col1
2012-01-01 a  A   2   
2012-02-01 b  B   33  
2012-03-01 c  C   <NA>

result$dol1 can be used to get just col1.

2) Another way to do this is the following which gives the same result. In this case the right hand side has three elements with the third being NA but since the right hand side is now a plain vector it is just copied element by element into result$col1 rather than doing an implicit merge.

result <- mon
result$col1 <- coredata(a$col1)[MATCH(time(mon), time(a))]
result

Other

Note that what is referred to as row.names in the question is the time index, not row names.

2
votes

If you look at the a object you find that the dates end at

> a
           col1 col2
2011-12-31    1   11
2012-01-01    2   12
<snipped most of them>
2012-02-16   48   58
2012-02-17   49   59
2012-02-18   50   60

So inside the creation of matched you got :

 MATCH(index(mon),index(a))
[1]  2 33 NA

That's what created the row of all NA's

a[MATCH(index(mon),index(a)) ]
 #--------
           col1 col2
2012-01-01    2   12
2012-02-01   33   43
<NA>         NA   NA

From which you picked the col1 items:

a[MATCH(index(mon),index(a))]$col1
#2012-01-01 2012-02-01       <NA> 
#         2         33         NA 

The [<- function in library zoo is quite different than ordinary [<- methods. You can examine the code with:

 getAnywhere(`[<-.zoo` ) 

It checks the number of arguments and determines which ones you gave and changes its logic accordingly. In the case such as yours where only the x and i arguments are given it does a matching process that results in an extra entry in the index vector and therefore an extra line being created. Arguably this is not an intended action and arguably there should have been an na.omit applied at some point in the process. One of the zoo authors, @G.Grothendeick, is a regular here and may be able to comment further. If so his word is Law. If you do the na.omit you get the expected result:

mon$matched <- na.omit(a[MATCH(index(mon),index(a))]$col1)

> mon
           mc mc2 matched
2012-01-01 a  A   2      
2012-02-01 b  B   33     
2012-03-01 c  C   <NA>