I have the following two data frames:
> df1
# A tibble: 4 x 4
x y z w
<dbl> <dbl> <dbl> <dbl>
4 5 8 9
4 6 7 4
3 6 7 10
8 2 8 9
> df2
# A tibble: 4 x 4
x y z w
<dbl> <dbl> <dbl> <dbl>
6 2 7 9
2 6 7 10
4 5 8 12
4 5 8 3
I would like to discover which rows in df2 have a match in df1, where a match means being identical in at least n/2 columns.
So in this example, row 1 in df2 is a match to row 4 in df1 (columns 1 and 3), row 2 in df2 matches row 2 in df1 on columns 2 and 3 and row 3 on columns 2,3,4 and so on.
I also have to save the location of the repeating rows and the columns on which they match.
For small data sets, I could replicate both data sets and subtract them and count the zeros. However what I need is a solution which would work on very large data sets (~20K rows).
Any ideas? A dplyr solution (rather than a data.table) would be highly appreciated.