data.table: subsetting a grouping variable in j with keyby

Question

Say I have this dataset

test <- data.table(X = rep(1, 3), Y = rep("a", 3))

which gives us

test
#   X Y
#1: 1 a
#2: 1 a
#3: 1 a

I'm wondering why

test[, X[Y == "a"], keyby = .(X)]

gives

#   X V1
#1: 1  1
#2: 1 NA
#3: 1 NA

Thank you in advance for your answers!

Did you meant to do test[Y == 'a', .SD, keyby = .(X)] or test[, .SD[Y == "a"], keyby = .(X)] — akrun
Not sure why you want to use the grouping column to subset it because grouping column output a single element while the other Y == 'a', returns 3 and thus it is filled with NA (unless you replicate the X — akrun
It's standard R behavior for out-of-bounds indexing. See R Intro3.4.1 Indexing by vectors: "If i is positive and exceeds length(x) then the corresponding selection is NA", together with the fact that inside each group, the grouping variable is of length 1 (see FAQ 2.10) — Henrik
Indeed. The length of the logical index (i) is 3. The length of the grouping variable inside each group is 1 (x, the vector you try to index; again, see FAQ): "If i is positive (yes, here i is 3) and exceeds length(x) (yes it does, length index vector i is 3, length of vector to be indexed x is 1) then the corresponding selection is NA — Henrik
Side-note: thanks for posting such a small, illustrative toy data set! — Henrik

ThomasIsCoding ThomasIsCoding · Accepted Answer · 2021-04-07T21:43:35

If you run X and Y=="a" separately

> test[, X, keyby = .(X)]
   X X
1: 1 1

> test[, Y == "a", keyby = .(X)]
   X   V1
1: 1 TRUE
2: 1 TRUE
3: 1 TRUE

you will see that, the first one gives numeric value 1 of length 1, and the second one gives logical values TRUE of length 3.

Since you don't have matched lengths for subsetting, you will obtain NAs to fill in the corresponding places, e.g.,

> 1[rep(TRUE,3)]
[1]  1 NA NA