Match() and the dynamic selection of columns of a data.table

Question

I am trying to use the match() function when dynamically selecting a column from a data.table DT1 while applying a filter and matching it to the column of another data.table DT2. It doesn't work, presumably because when one selects a column dynamically from a data.table, the output is again a data.table rather than a vector.

An example:

col1 = c(1, 3, 1, 3, 2, 2)
col2 = c("Europe", "Europe", "Europe", "Europe", "Asia", "Asia")
DT1 = data.table(col1, col2)

col3 = 1:3
col4 = c( "carrot", "apple", "tomato")
DT2 = data.table(col3, col4)

This works:

> match(DT1[col2 == "Europe", col1], DT2[, col3])
[1] 1 3 1 3

This (what I need) doesn't work:

> columnName = "col1"
> match(DT1[col2 == "Europe", columnName, with = FALSE], DT2[, col3])
[1] NA

Only when col1 is entered as a variable in j of a data.table, the result is a vector. Could this be the reason? To illustrate what I mean:

> DT1[, col1]
[1] 1 3 1 3 2 2
> DT1[, "col1"]
   col1
1:    1
2:    3
3:    1
4:    3
5:    2
6:    2

I thought match() would deal with this automatically because itshelp file states:

"Factors, raw vectors and lists are converted to character vectors, and then x and table are coerced to a common type (the later of the two types in R's ordering, logical < integer < numeric < complex < character) before matching. If incomparables has positive length it is coerced to the common type."

You can try subsetting with [[, i.e. match(DT1[col2 == "Europe"][[columnName]], DT2[, col3]) — talat

Untitpoi Untitpoi · Accepted Answer · 2017-12-19T10:42:40

It seems to work when not using data.table, I think you should be careful when using col1 because there is confusion between the vector col1 and the column of the data.frame or data.table.

DT1 = data.frame(col1=c(1, 3, 1, 3, 2, 2), col2=c("Europe", "Europe", "Europe", "Europe", "Asia", "Asia"))
DT2 = data.frame(col3 = 1:3, col4 = c( "carrot", "apple", "tomato"))

match(DT1[DT1$col2 == "Europe", "col1"], DT2[, "col3"])

columnName = "col1"
match(DT1[DT1$col2 == "Europe", columnName], DT2[, "col3"])

Match() and the dynamic selection of columns of a data.table

2 Answers