81
votes

What's the correct way to remove multiple columns from a data.table? I'm currently using the code below, but was getting unexpected behavior when I accidentally repeated one of the column names. I wasn't sure if this was a bug, or if I shouldn't be removing columns this way.

library(data.table)
DT <- data.table(x = letters, y = letters, z = letters)
DT[ ,c("x","y") := NULL]
names(DT)
[1] "z"

The above works fine, but

DT <- data.table(x = letters, y = letters, z = letters)
DT[ ,c("x","x") := NULL]
names(DT)
[1] "z"
2
+1. Interesting find. If you delete two "y"s, you get "x" left over. And if you delete two "z"s it crashes!Frank
For now you could wrap the LHS of the := assignment in a call to unique() (i.e. use DT[ ,unique(c("x","x")) := NULL]) to be extra defensive. Since this seems like a data.table bug, I'd guess you'll only have to do that until Matthew Dowle moves that call to unique() (or something equivalent to it) inside of the [.data.table()Josh O'Brien
Good idea about unique. Thanks.matt_k
Hello guys, perhaps you know why this [R] code does not work for me? > myCols <- c("Col1", "Col2") > DT[, myCols:=NULL] Suppose that DT contains both columnsMindaugasK
@MindaugasK I found a solution to that -- you still have to wrap you list of columns as a list for it to work. Change it to DT[, c(myCols):=NULL] and that should do the trick. See rdatatable.gitlab.io/data.table/articles/…Vince

2 Answers

38
votes

This looks like a solid, reproducible bug. It's been filed as Bug #2791.

It appears that repeating the column attempts to delete the subsequent columns.
If no columns remain, then R crashes.


UPDATE : Now fixed in v1.8.11. From NEWS :

Assigning to the same column twice in the same query is now an error rather than a crash in some circumstances; e.g., DT[,c("B","B"):=NULL] (delete by reference the same column twice). Thanks to Ricardo (#2751) and matt_k (#2791) for reporting. Tests added.

20
votes

This Q has been answered but regard this as a side note.

I prefer the following syntax to drop multiple columns

DT[ ,`:=`(x = NULL, y = NULL)]

because it matches the one to add multiple columns (variables)

DT[ ,`:=`(x = letters, y = "Male")]

This also check for duplicated column names. So trying to drop x twice will throw an error message.