How do I select rows by two criteria in data.table in R

Question

Let's say I have a data.table and I want to select all the rows where the variable x has a value of b. That is easy

library(data.table)
DT <- data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
setkey(DT,x)               # set a 1-column key
DT["b"]

By the way, it appears that one has to set a key, if the key is not set to x then this does not work. By the way what would happen if I set two columns as keys?

Anyway, moving along, lets say that I want to select all the rows where the variable x was a or b

DT["b"|"a"]

does not work

But the following works

DT[x=="a"|x=="b"]

But that uses vector scanning a la data frames. It does not use the binary search. I guess for smaller data sets it will not matter.

Is that what I should do or am I ignorant of data.table syntax?

And one more thing. Are there any examples of more complex Boolean multi-variable selection (or subset) procedures with data.table?

I know I could always revert to using the subset() function since a data.table will behave as a data.frame if it must.

A detailed worked example of multi-column key is in the Introduction vignette. — Matt Dowle
And not sure how well known it is to work through result of example(data.table) at the prompt - examples are there. — Matt Dowle

Farrel Farrel · Accepted Answer · 2011-12-14T20:10:54

Here is a way that only crossed my mind after I asked the question and it works but I do not know how it does in benchmarks. I am not currently at a computer with an installed R. I guess I should use a cloud instance. Anyway, I like the syntax

DT[c("a","b")]

How do I select rows by two criteria in data.table in R

2 Answers