My question is related to assignment by reference versus copying in data.table
. I want to know if one can delete rows by reference, similar to
DT[ , someCol := NULL]
I want to know about
DT[someRow := NULL, ]
I guess there's a good reason for why this function doesn't exist, so maybe you could just point out a good alternative to the usual copying approach, as below. In particular, going with my favourite from example(data.table),
DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
# x y v
# [1,] a 1 1
# [2,] a 3 2
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9
Say I want to delete the first row from this data.table. I know I can do this:
DT <- DT[-1, ]
but often we may want to avoid that, because we are copying the object (and that requires about 3*N memory, if N object.size(DT)
, as pointed out here.
Now I found set(DT, i, j, value)
. I know how to set specific values (like here: set all values in rows 1 and 2 and columns 2 and 3 to zero)
set(DT, 1:2, 2:3, 0)
DT
# x y v
# [1,] a 0 0
# [2,] a 0 0
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9
But how can I erase the first two rows, say? Doing
set(DT, 1:2, 1:3, NULL)
sets the entire DT to NULL.
My SQL knowledge is very limited, so you guys tell me: given data.table uses SQL technology, is there an equivalent to the SQL command
DELETE FROM table_name
WHERE some_column=some_value
in data.table?
data.table()
uses SQL technology so much as one can draw a parallel between the different operations in SQL and the various arguments to adata.table
. To me, the reference to "technology" somewhat implies thatdata.table
is sitting on top of a SQL database somewhere, which AFAIK is not the case. – ChaseDT[ , keep := .I > 1]
, then subset for later operations:DT[(keep), ...]
, perhaps evensetindex(DT, keep)
the speed of this subsetting. Not a panacea, but worthwhile to consider as a design choice in your workflow -- do you really want to delete all those rows from memory, or would you prefer to exclude them? The answer differs by use case. – MichaelChirico