Usually, when you subset on a key column to take advantage of fast binary search based subset, you'd do:
DT[J(values)] # assuming subset here is on the first key column.
# (or)
DT[.(values)] # idem
Both . and J here do exactly the same. When your key column is of type character, since you also have to quote the character value, data.table also allows for a join if possible without the J or ., for convenience. That is,
DT["a"] # subset on the first key column if one exists
# (or)
DT[J("a")] # idem
# (or)
DT[.("a")] # idem
This facility is just on character vectors. It's possible because you can't subset a data.table using character vector in i by any other way. So, it's easy to tell that you're wanting a join. But if you provide DT[2], 2 here being numeric, data.table can't really tell if you're expecting a join or a normal row-subset. That's why it's just for characters.
Now, DT[J(.)] will be fast because, when the key is set, it's already sorted and therefore we can subset using fast binary search. However, the case DT[x < .] uses normal vector scan approach. That is, it'll check all the values of x for the value a, even if the values are sorted by x. Therefore, the second one will be slower than the first.
There are feature requests to enable binary search based subset on ranges. You've have a look here. Once it's implemented, these things will automatically get faster. We've not gotten to it just yet.
HTH.
PS: Note that you're comparing DT["2"] - which is a character key column based binary search subset against DT[key < 2] where key is numeric here. They're not the same. The equivalent (as I explained above) is DT[J(2)].
Also note that they are not equivalent operations. DT[J(2)] looks only for key column that matches 2 in DT, where as DT[key < 2] looks for all values in the range [min[key], 2).
DT[2]is NOT the same asDT[key==2]! And if you really set the key, your second version should be extremely fast. - shadowDT[2]does a binary search(fast), whileDT[key==2]does a vector scan(slow). However I get 0.03 seconds forDT[2]and 0.5 seconds forDT[key<2]. Is it possible to have a fastDT[key<2]? - misha_dodicDT[2]gives second row not the row wherekey == 2- CHP