1
votes

I want to add a column to a data.table which shows how many copies of each row exist. Take the following example:

library(data.table)
DT <- data.table(id = 1:10, colA = c(1,1,2,3,4,5,6,7,7,7), colB = c(1,1,2,3,4,5,6,7,8,8))
setkey(DT, colA, colB)
DT[, copies := length(colA), by = .(colA, colB)]

The output it gives is

   id colA colB copies
 1:  1    1    1      1
 2:  2    1    1      1
 3:  3    2    2      1
 4:  4    3    3      1
 5:  5    4    4      1
 6:  6    5    5      1
 7:  7    6    6      1
 8:  8    7    7      1
 9:  9    7    8      1
10: 10    7    8      1

Desired output is:

   id colA colB copies
 1:  1    1    1      2
 2:  2    1    1      2
 3:  3    2    2      1
 4:  4    3    3      1
 5:  5    4    4      1
 6:  6    5    5      1
 7:  7    6    6      1
 8:  8    7    7      1
 9:  9    7    8      2
10: 10    7    8      2

How should I do it?

I also want to know why my approach doesn't. work. Isn't it true that when you group by colA and colB, the first group should contain two rows of data? I understand if "length" is not the function to use, but I cannot think of any other function to use. I thought of "nrow" but what can I pass to it?

1
Use .N. When you group by colA, in each group colA is just a single number. - eddi
As in DT[, copies := .N, by=.(colA,colB)] - Pierre L

1 Answers

4
votes
DT[, copies := .N, by=.(colA,colB)]
#     id colA colB copies
#  1:  1    1    1      2
#  2:  2    1    1      2
#  3:  3    2    2      1
#  4:  4    3    3      1
#  5:  5    4    4      1
#  6:  6    5    5      1
#  7:  7    6    6      1
#  8:  8    7    7      1
#  9:  9    7    8      2
# 10: 10    7    8      2

As mentioned in the comments, .N will calculate the length of the grouped object as defined in the by argument.