3
votes

My two dataframes are:

df1<-structure(list(header1 = structure(1:4, .Label = c("a", "b", 
"c", "d"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

and

df2<-structure(list(sample_x = structure(c(1L, 1L, 2L, 3L), .Label = c("0", 
"a", "c"), class = "factor"), sample_y = structure(c(1L, 3L, 
2L, 4L), .Label = c("0", "a", "m", "t"), class = "factor"), sample_z = structure(c(3L, 
2L, 1L, 1L), .Label = c("0", "a", "c"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

0s in df2 means no values.

Now I want to overlap df1 and df2 to make an output dataframe(df3):

df3<-structure(list(sample_x = c(2L, 2L, 0L), sample_y = c(1L, 3L, 
2L), sample_z = c(2L, 2L, 0L)), class = "data.frame", row.names = c("overlap_df1_df2", 
"unique_df1", "unique_df2"))

I tried the datatable function foverlaps:

setkeyv(df1, names(df1))
setkeyv(df2, names(df2))
df3<-foverlaps(df1,df2)

But seems like I need to have some common column names in these two dataframes, which is obviously not the case. Thank you!

2
foverlaps is the wrong tool here, maybe read about ?setdiff - zx8754

2 Answers

4
votes

Loop through columns, and use set operations:

sapply(df2, function(i){
  x = i[ !is.na(i) ]
  o = intersect(df1$header1, x)
  u_df1 = setdiff(df1$header1, o)
  u_df2 = setdiff(x, o)
  c(o = length(o),
    u_df1 = length(u_df1),
    u_df2 = length(u_df2))
})
#       sample_x sample_y sample_z
# o            2        1        2
# u_df1        2        3        2
# u_df2        0        2        0
2
votes

A solution using map:

library(purrr)
rbind(
  overlap = map_dbl(df2, ~length(intersect(df1$header1, .x))),
  unique_df1 = map_dbl(df2, ~length(setdiff(df1$header1, .x))),
  unique_df2 = unique_df1 - overlap
)

           sample_x sample_y sample_z
overlap           2        1        2
unique_df1        2        3        2
unique_df2        0        2        0