select data.table R rows based on row number and condition

Question

In data.table, we can select rows based on row number or condition:

> x <- data.table(letters[1:4], 1:4)
> x
   V1 V2
1:  a  1
2:  b  2
3:  c  3
4:  d  4
> x[2]
   V1 V2
1:  b  2
> x[V1 == "d"]
   V1 V2
1:  d  4

However I can't select with both row number and condition:

> x[!2 & V2 > 1]
Empty data.table (0 rows) of 2 cols: V1,V2

This is probably because the !2 is not interpreted as row number in this format. I knew I can chain the two conditions:

> x[!2][V2 > 1]
   V1 V2
1:  c  3
2:  d  4

However I wanted to assign new column values for this subset

x[!2][V2 > 1, V3 := "more"]

Now it only created column for the intermediate chained data.table. I can save the intermediate table then merge back to the original table, but that will be cumbersome.

Actually I often feel data.table need a proper row number. .I is a dynamic number depend on group, but I wanted a unique id that can identify every row, this unique id can be very useful in merge/join (often the data don't have a unique id). If .i is the row number, I can use

x[(.i != 2) & (V2 >1), V3 := "more"]

I can simulate this by creating a row number column explicitly first.

Another method is to make the modifications on subset data.table apply back the original table. Suppose we have x as original table, x[!2] as a subset, then if modifications on x[!2] actually modify x, my question will be solved too. Of course this kind of subset need to be created in different way, for example x[!2, refOriginal = TRUE].

Found a discussion about real row number here, many shared similar opinions. — dracodoc

Psidom Psidom · Accepted Answer · 2016-09-10T04:11:36

Here are my two attempts toward the solution: The first one uses the summarize syntax of data.table to calculate a logical vector using row number .I and condition at position i for subsetting and updating columns; the second one uses which and setdiff to remove certain row numbers from the condition, if on the other hand you need an and operation of row number and condition, setdiff can be replaced with union:

x[x[, .I != 2 & V2 > 2], V3 := "more"]
x
#    V1 V2   V3
# 1:  a  1   NA
# 2:  b  2   NA
# 3:  c  3 more
# 4:  d  4 more


x[setdiff(which(V2 > 2), c(2)), V3 := "more"]
x
#    V1 V2   V3
# 1:  a  1   NA
# 2:  b  2   NA
# 3:  c  3 more
# 4:  d  4 more

select data.table R rows based on row number and condition

1 Answers