2
votes

I have data.frame df1 and a data.frame df2. How do I use df2 to mutate/transform df1 to merged data.frame where: column name will be filled with the value on df2$name if df1$id >= df2$start and <= df2$end.

df1 = data.frame(id = 1:10, c = letters[1:10])
df2 = data.frame(name = LETTERS[1:3], start = c(2, 5, 8), end = c(4,7, 9))
merged = data.frame(id = df1$id, c = df1$c, name = c(NA, "A", "A", "A", "B", "B", "B", "C", "C", NA) )

Visually:

> df1
   id c
1   1 a
2   2 b
3   3 c
4   4 d
5   5 e
6   6 f
7   7 g
8   8 h
9   9 i
10 10 j
> df2
  name start end
1    A     2   4
2    B     5   7
3    C     8   9
> merged
   id c name
1   1 a <NA>
2   2 b    A
3   3 c    A
4   4 d    A
5   5 e    B
6   6 f    B
7   7 g    B
8   8 h    C
9   9 i    C
10 10 j <NA>
1

1 Answers

2
votes

We can use non-equi join with data.table and assign a new column with the corresponding values of 'name' where the conditional join is met

library(data.table)
setDT(df1)[df2, cn := name, on = .(id > start, id <= end)]
df1
#    id c   cn
# 1:  1 a <NA>
# 2:  2 b <NA>
# 3:  3 c    A
# 4:  4 d    A
# 5:  5 e <NA>
# 6:  6 f    B
# 7:  7 g    B
# 8:  8 h <NA>
# 9:  9 i    C
#10: 10 j <NA>

Or another option is fuzzyjoin

library(fuzzyjoin)
library(dplyr)
fuzzy_left_join(df1, df2, by = c('id' = 'start', 'id' = 'end'), 
         match_fun = list(`>`, `<=`)) %>%
     select(id, c, cn = name)