1
votes

I have following data.frame (df)

ID1 ID2 Col1 Col2 Col3 Grp
A   B   1    3    6    G1
C   D   3    5    7    G1
E   F   4    5    7    G2
G   h   5    6    8    G2

What I would like to achieve is the following: - group by Grp, easy - and then summarize so that for each group I sum the columns and create the columns with strings with all ID1s and ID2s

It would be something like this:

df %>% 
   group_by(Grp) %>% 
      summarize(ID1s=toString(ID1), ID2s=toString(ID2), Col1=sum(Col1), Col2=sum(Col2), Col3=sum(Col3))

Everything is fine whae Iknow the number of the columns (Col1, Col2, Col3), however I would like to be able to implement it so that it would work for a data frame with known and always named the same ID1, ID2, Grp, and any number of additional numeric column with unknown names.

Is there a way to do it in dplyr.

2
Have you tried? summarise_at(vars(starts_with("Col")), sum)Pierre L
How would you use it with the other columns to be summarized to and also with possibly different/unknown names instead of Col1, col2 etckwicher
If the possible names are unknown, how would you suggest a computer find them?Pierre L
Following up on Pierre's comment, what is distinct about the columns you want to reference? Is the Grp column always going to be the last column? In that case you'd be looking for a way to reference all columns except the last one. Think about how you know what you're looking for as a human and then we can work on getting the computer to find it.Andrew Brēza
For the record, it isn't case sensitive by defaultPierre L

2 Answers

4
votes

I would like to be able to implement it so that it would work for a data frame with known and always named the same ID1, ID2, Grp, and any number of additional numeric column with unknown names.

You can overwrite the ID columns first and then group by them as well:

DF %>% 
  group_by(Grp) %>% mutate_each(funs(. %>% unique %>% sort %>% toString), ID1, ID2) %>% 
  group_by(ID1, ID2, add=TRUE) %>% summarise_each(funs(sum))

# Source: local data frame [2 x 6]
# Groups: Grp, ID1 [?]
# 
#     Grp   ID1   ID2  Col1  Col2  Col3
#   (chr) (chr) (chr) (int) (int) (int)
# 1    G1  A, C  B, D     4     8    13
# 2    G2  E, G  F, h     9    11    15

I think you'll want to uniqify and sort before collapsing to a string, so I've added those steps.

0
votes

Using the data table you could try the following:

   setDT(df)
   sd_cols=3:(ncol(df)-1)
   merge(df[ ,.(toString(ID1), toString(ID2)), by = Grp],  df[ , c(-1,-2), with = F][ , lapply(.SD, sum), by = Grp],by = "Grp")