0
votes

I have a dataframe with 2 columns. And I want to combine the names in column GO into the same cell, separated by a comma whenever they have the same name in column targets.

head(a)

                           targets         GO
1 TRINITY_GG_100008_c0_g1_i8.mrna1 GO:0030515
2 TRINITY_GG_100008_c0_g1_i8.mrna1 GO:0030515
3 TRINITY_GG_100016_c0_g1_i1.mrna1 GO:0003996
4 TRINITY_GG_100016_c0_g1_i1.mrna1 GO:0004467
5 TRINITY_GG_100016_c0_g1_i1.mrna1 GO:0047676
6 TRINITY_GG_100016_c0_g1_i1.mrna1 GO:0102391

> dput(a)
structure(list(targets = c("TRINITY_GG_100008_c0_g1_i8.mrna1", 
"TRINITY_GG_100008_c0_g1_i8.mrna1", "TRINITY_GG_100016_c0_g1_i1.mrna1", 
"TRINITY_GG_100016_c0_g1_i1.mrna1", "TRINITY_GG_100016_c0_g1_i1.mrna1", 
"TRINITY_GG_100016_c0_g1_i1.mrna1"), GO = c("GO:0030515", "GO:0030515", 
"GO:0003996", "GO:0004467", "GO:0047676", "GO:0102391")), row.names = c(NA, 
6L), class = "data.frame")

I have tried this so far using dplyr but I am not there yet.

a %>% group_by(targets) %>%
    summarize(GO, sep","))

I want the results to look like this:

                         targets         GO
TRINITY_GG_100008_c0_g1_i8.mrna1       GO:0030515, GO:0030515 
TRINITY_GG_100016_c0_g1_i1.mrna1       GO:0003996, GO:0004467, GO:0047676, GO:0102391

Hope someone can help me! Thanks

1
Look at the toString function. - A5C1D2H2I1M1N2O1R2T1
Thanks, I just solved it. I guess I posted the question too fast. a %>% group_by(targets) %>% summarise_all(funs(paste(na.omit(.), collapse = ","))) - Amaranta_Remedios
You can self-answer the question and mark it as accepted to show it's been resolved. - A5C1D2H2I1M1N2O1R2T1

1 Answers

0
votes

Gladly it all got solved quite fast. This is the answer in case anyone has the same problem in the future.

a %>% group_by(targets) %>% 
summarise_all(funs(paste(na.omit(.), collapse = ",")))