I'm brand new to the (completely marvelous) data.table package, and seem to have gotten stuck on a very basic, somewhat bizarre problem. I can't post the exact data set I'm working with, for which I apologize -- but I think the problem is simple enough to articulate that hopefully this will still be very clear.
Let's say I have a data.table like so, with key x:
set1
x y
1: 1 a
2: 1 b
3: 1 c
4: 2 a
I want to return a subset of set1 containing all rows where x == 1. This is wonderfully simple in data.table: set1[J(1)]. Bam. Done. I can also assign z <- 1, and call set1[J(z)]. Again: works great.
...except when I try to scale it up to my actual data set, which contains ~6M rows. When I call set1[J(1674)], I get back a 78-row return that's exactly what I'm looking for. But I need to be able to look up (literally) 4M of these subsets. When I assign the value I'm searching for to a variable, id <- 1674, and call set1[J(id)]... R nearly takes down my desktop.
Clearly something I don't understand is going on under the data.table hood, but I haven't been able to figure out what. Googling and slogging through Stack Overflow suggest that this should work. Out of pure whimsey, I've tried:
id <- quote(1674)
set1[J(eval(id))]
...but that is far, far worse. What... what's going on?
top, when I callset1[J(id)], rsession starts using up to 97% of system memory. The box becomes functionally unusable until I manage to kill the rsession process some time later. This is by contrast toset1[J(1674)], which returns 78 rows as soon as I pressenter. - Gastove