Retrieve the (unique) keys used by a data.table

Question

Is there a data.table idiomatic way to obtain the unique keys of a data table, when it is given as a single column?

I am working over a number of data sets, each of around 10 million rows and want to keep function calls/overhead to a minimum. With the below toy example,

require(data.table)
d_test<-data.table(id=c(1,1,2,7,2,3,5),
                   amt=c(100,200,400,600,231,-100,-200),
                   pay=c(-2,rep(1:3,2)),
                   key="id")

the output I am seeking is equivalent to, either as a vector or data.table,

unique(d_test[,.(id)]), or unique(d_test$id)

that is, c(1,2,3,5,7)

I guess the idiomatic way might be unique(d_test[, key(d_test), with=FALSE]), but for the specific case of a single-column key, your approaches and George's (in the answer below) seem fine. Note that the default of unique.data.table is to go by=key(x). See ?unique.data.table — Frank

jangorecki jangorecki · Accepted Answer · 2016-07-15T14:38:45

Another way

k="id"
d2=unique(d_test, by=k)
set(d2, j=setdiff(names(d2),k), value=NULL)
d2
#   id
#1:  1
#2:  2
#3:  3
#4:  5
#5:  7

This will be easier when #1269: Returning only groups implemented.

Retrieve the (unique) keys used by a data.table

2 Answers