R DataTable Solution To Making Sum Columns

Question

data1=data.frame("group1"=c(1,1,1,1,2,2,2,2,3,3,3,3,1,1,1,1,2,2,2,2,3,3,3,3),
                 "group2"=c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2),
                 "var1"=c(1,0,0,1,0,0,0,1,1,1,1,1,0,0,1,0,0,1,0,1,1,0,0,0),
                 "var2"=c(1,0,1,1,0,0,1,0,0,0,0,0,0,0,1,1,0,0,1,0,1,0,0,1),
                 "var3"=c(1,1,4,3,3,1,1,2,4,1,4,4,4,2,1,2,1,2,2,2,3,1,2,4))


data2=data.frame("group1"=rep(c(rep(1:3,2)),2),
                 "group2"=rep(c(rep(1:2,3))),
                 "var1"=sort(rep(0:1,6)),
                 "svar1" = c(2,2,0,3,3,3,1,2,4,1,1,1),
                 "var2"=sort(rep(0:1,6)),
                 "svar2" = c(rep(NA,12)))

I have 'data1' and hope to make 'data2'. What it does is it collapses the actual counts of 'var1' and 'var2' to create 'svar1' and 'svar2' in 'data2'.

to create 'svar1' we sift through all combinations of 'group1' and 'group2' in 'data1' and then just store the sum of all occurances of '0' and '1' which are the response options for 'var1'. I wish to also do this for 'var2' to generate 'svar2'

I also hope for a data.table solution given the big data!! For now we can ignore 'var3'!

Gregor Thomas Gregor Thomas · Accepted Answer · 2020-03-16T02:21:31

Here's a data.table based approach with joins. I think, in the result, the supposition is that var1 = var2 on each row, and svar1 and svar2 are the row counts in the original data frame with those combinations.

I'll leave duplicating the var1 column and filling the NAs in with 0s to you.

setDT(data1)

merge(
  data1[, .(svar1 = .N), by = .(group1, group2, var1)],
  data1[, .(svar2 = .N), by = .(group1, group2, var2)],
  by.x = c("group1", "group2", "var1"),
  by.y = c("group1", "group2", "var2"),
  all = TRUE
)
#     group1 group2 var1 svar1 svar2
#  1:      1      1    0     2     1
#  2:      1      1    1     2     3
#  3:      1      2    0     3     2
#  4:      1      2    1     1     2
#  5:      2      1    0     3     3
#  6:      2      1    1     1     1
#  7:      2      2    0     2     3
#  8:      2      2    1     2     1
#  9:      3      1    0    NA     4
# 10:      3      1    1     4    NA
# 11:      3      2    0     3     2
# 12:      3      2    1     1     2

R DataTable Solution To Making Sum Columns

1 Answers