2
votes

I have the following 2 data tables:

DT1 <- data.table(A = c(100,50,10), B = c("Good","Ok","Bad"))
DT1
     A    B
1: 100 Good
2:  50   Ok
3:  10  Bad

and

DT2 <- data.table(A = c(99,34,5,"",24,86))
DT2
    A
1: 99
2: 34
3:  5
4:   
5: 24
6: 86   

What I would like to return when joining DT1 and DT2 is

DT2
    A       B
1: 99    Good
2: 34    Ok
3:  5    Bad
4:       NA
5: 24    Ok
6: 86    Good

The "roll" option in data.table is only for "nearest" match so it doesnt work in my case. Is there any way I can do such lookup with data.table?

2
The value 24 is closer to Bad than to Ok, at least as measured by absolute distance. Then why did you choose Ok as the matched value?Tim Biegeleisen

2 Answers

3
votes

The rolling join does work for me if rolled backwards (NOCB = next obervation carried backwards):

library(data.table)
DT1 <- data.table(A = c(100, 50, 10), B = c("Good", "Ok", "Bad"))
DT2 <- data.table(A = c(99, 34, 5, "", 24, 86))

DT2[, A := as.numeric(A)]
DT1[DT2, on = "A", roll = -Inf]
    A    B
1: 99 Good
2: 34   Ok
3:  5  Bad
4: NA <NA>
5: 24   Ok
6: 86 Good

Note that this does only work if both columns A are numeric (or integer). By using "", the OP has turned DT2$A into a character column.

1
votes

Here is a base R approach

df1 <- as.data.frame(DT1)
df2 <- as.data.frame(DT2)

df2$B <- apply(df2, 1, function(x) {
    if(x != "") df1$B[which.min(abs(as.numeric(x) - df1$A))] else NA
})
df2
#    A    B
# 1 99 Good
# 2 34   Ok
# 3  0  Bad
# 4    <NA>
# 5 24  Bad
# 6 86 Good

Or the same using data.tables

DT2[, B := apply(DT2, 1, function(x) 
    if(x != "") DT1$B[which.min(abs(as.numeric(x) - DT1$A))] else NA)]
DT2
#    A    B
#1: 99 Good
#2: 34   Ok
#3:  0  Bad
#4:      NA
#5: 24  Bad
#6: 86 Good

We match based on the smallest absolute difference between DT1$A and DT2$A values.


Sample data

DT1 <- data.table(A = c(100,50,0), B = c("Good","Ok","Bad"))
DT2 <- data.table(A = c(99,34,0,"",24,86))