147
votes

How can we select multiple columns using a vector of their numeric indices (position) in data.table?

This is how we would do with a data.frame:

df <- data.frame(a = 1, b = 2, c = 3)
df[ , 2:3]
#   b c
# 1 2 3
5

5 Answers

188
votes

For versions of data.table >= 1.9.8, the following all just work:

library(data.table)
dt <- data.table(a = 1, b = 2, c = 3)

# select single column by index
dt[, 2]
#    b
# 1: 2

# select multiple columns by index
dt[, 2:3]
#    b c
# 1: 2 3

# select single column by name
dt[, "a"]
#    a
# 1: 1

# select multiple columns by name
dt[, c("a", "b")]
#    a b
# 1: 1 2

For versions of data.table < 1.9.8 (for which numerical column selection required the use of with = FALSE), see this previous version of this answer. See also NEWS on v1.9.8, POTENTIALLY BREAKING CHANGES, point 3.

45
votes

It's a bit verbose, but i've gotten used to using the hidden .SD variable.

b<-data.table(a=1,b=2,c=3,d=4)
b[,.SD,.SDcols=c(1:2)]

It's a bit of a hassle, but you don't lose out on other data.table features (I don't think), so you should still be able to use other important functions like join tables etc.

39
votes

If you want to use column names to select the columns, simply use .(), which is an alias for list():

library(data.table)
dt <- data.table(a = 1:2, b = 2:3, c = 3:4)
dt[ , .(b, c)] # select the columns b and c
# Result:
#    b c
# 1: 2 3
# 2: 3 4
20
votes

From v1.10.2 onwards, you can also use ..

dt <- data.table(a=1:2, b=2:3, c=3:4)

keep_cols = c("a", "c")

dt[, ..keep_cols]
3
votes

@Tom, thank you very much for pointing out this solution. It works great for me.

I was looking for a way to just exclude one column from printing and from the example above. To exclude the second column you can do something like this

library(data.table)
dt <- data.table(a=1:2, b=2:3, c=3:4)
dt[,.SD,.SDcols=-2]
dt[,.SD,.SDcols=c(1,3)]