For example,
set.seed(1984)
d <- data.table(name=letters[1:26],a=rbinom(26,1,0.5),b=rbinom(26,1,0.5),c=rbinom(26,1,0.5))
I can remove rows that a, b, c columns are 0 by:
d[,if(sum(a,b,c) != 0) .SD,by=.(a,b,c)]
the result is:
a b c name
1: 1 1 1 a
2: 1 1 1 u
3: 1 1 1 x
4: 0 1 0 b
5: 0 1 0 d
6: 0 1 0 h
7: 0 1 1 c
8: 0 1 1 g
9: 0 1 1 o
10: 0 1 1 q
11: 0 1 1 t
12: 1 1 0 e
13: 1 1 0 k
14: 1 1 0 y
15: 1 0 0 f
16: 1 0 0 i
17: 1 0 0 r
18: 1 0 0 s
19: 1 0 0 w
20: 0 0 1 j
21: 0 0 1 v
22: 1 0 1 m
23: 1 0 1 n
a b c name
Now, I have two questions:
- How to keep "name" column as the first column?
- How to choose a, b, c columns as a simple expression (like a:c, but a:c is not meant a, b, c)? If there are hundreds columns, I can't type endless a, b, c ... in sum function or being the parameters of by.
Add question:
if it is not sum (has rowSums version for handling rows) but other functions like max, how to resovle question 1 and 2 without apply function family (apply function family is designed for data frame, I am afraid of they will decrease the speed of data table).
d[d[, rowSums(.SD) != 0, .SDcols = a:c]]? - talat