Summarizing multiple columns based on one column in R

Question

I have a data frame which looks something like this:

TCGA_Name	Full_Name	Gene.Name
Thyroid Carcinoma	Papillary Thyroid Cancer	NRAS
Thyroid Carcinoma	Thyroid Gland Carcinoma	NRAS
Sarcoma	Uterine leiomyosarcoma	PIK3CA
Sarcoma	Sarcoma	PIK3CA
Ovarian Serous Cystadenocarcinoma	High Grade Serous Ovarian Cancer	PIK3CA

What I'm trying to reduce the number of rows based on TCGA_Name. I want to Full_Name cancer types if they have the same TCGA heading and share their gene.name. The final product should look like this:

TCGA_Name	Full_Name	Gene.Name
Thyroid Carcinoma	Papillary Thyroid Cancer, Thyroid Gland Carcinoma	NRAS
Sarcoma	Uterine leiomyosarcoma, Sarcoma	PIK3CA
Ovarian Serous Cystadenocarcinoma	High Grade Serous Ovarian Cancer	PIK3CA

so far I've managed this:

library(plyr) 
df1 <- ddply(df1, .(TCGA_Name), summarize, text=paste(Hotspot_Name, collapse=", "))```

but this deletes the Gene.Name column

as always, any help is really appreciated!

G. Can G. Can · Accepted Answer · 2021-08-12T15:06:40

Is it what you want?

df1 <- ddply(df1, .(TCGA_Name,Gene.Name), summarize, text=paste(Full_Name, collapse=", "))

Just add 'Gene.Name'

Summarizing multiple columns based on one column in R

1 Answers