I want to add a column to a data.table which shows how many copies of each row exist. Take the following example:
library(data.table)
DT <- data.table(id = 1:10, colA = c(1,1,2,3,4,5,6,7,7,7), colB = c(1,1,2,3,4,5,6,7,8,8))
setkey(DT, colA, colB)
DT[, copies := length(colA), by = .(colA, colB)]
The output it gives is
id colA colB copies
1: 1 1 1 1
2: 2 1 1 1
3: 3 2 2 1
4: 4 3 3 1
5: 5 4 4 1
6: 6 5 5 1
7: 7 6 6 1
8: 8 7 7 1
9: 9 7 8 1
10: 10 7 8 1
Desired output is:
id colA colB copies
1: 1 1 1 2
2: 2 1 1 2
3: 3 2 2 1
4: 4 3 3 1
5: 5 4 4 1
6: 6 5 5 1
7: 7 6 6 1
8: 8 7 7 1
9: 9 7 8 2
10: 10 7 8 2
How should I do it?
I also want to know why my approach doesn't. work. Isn't it true that when you group by colA and colB, the first group should contain two rows of data? I understand if "length" is not the function to use, but I cannot think of any other function to use. I thought of "nrow" but what can I pass to it?
.N
. When you group bycolA
, in each groupcolA
is just a single number. - eddiDT[, copies := .N, by=.(colA,colB)]
- Pierre L