How to keep index when merging two dataframes in r

Question

I have two dataframes:

Then merge df1 and df2 to get df3, please notice merge by=c("a","b")

df3<-merge(df1,df2)
> df3
  a b
1 1 2
2 2 3
3 3 4

I would like to get the index of rows in df1 which are selected, and add a column call "label" in df1.

   > df1:
      a b label    
    1 1 2  TRUE
    2 2 3  TRUE
    3 2 4 FALSE
    4 3 4  TRUE
    5 4 4 FALSE

I tried this:

df1$label<-apply(df1,1,function (x) ifelse(nrow(merge(x,df3))>0,TRUE,FALSE))

got the wrong result and it's very slow since my df1 is very large. Is there any easy way? like is.element in vectors? Thank you.

Tim Biegeleisen Tim Biegeleisen · Accepted Answer · 2015-03-30T05:59:58

Merge on a and b by doing the equivalent of a LEFT OUTER JOIN in SQL, and then assign non-matching rows the value FALSE:

df1 <- data.frame(a=c(1,2,2,3,4), b=c(2,3,4,4,4))
df2 <- data.frame(a=c(1,1,3,5), b=c(1,2,4,5))
df2$label <- TRUE                                  # df1 matches to df2 is TRUE
df3 <- merge(df1, df2, by=c("a", "b"), all.x=TRUE) # merge on a AND b 
df3$label[is.na(df3$label)] <- FALSE               # non-match is FALSE

Output:

> df3
  a b label
1 1 2  TRUE
2 2 3  TRUE
3 2 4 FALSE
4 3 4  TRUE
5 4 4 FALSE

How to keep index when merging two dataframes in r

3 Answers