I have a data frame which looks something like this:
TCGA_Name | Full_Name | Gene.Name |
---|---|---|
Thyroid Carcinoma | Papillary Thyroid Cancer | NRAS |
Thyroid Carcinoma | Thyroid Gland Carcinoma | NRAS |
Sarcoma | Uterine leiomyosarcoma | PIK3CA |
Sarcoma | Sarcoma | PIK3CA |
Ovarian Serous Cystadenocarcinoma | High Grade Serous Ovarian Cancer | PIK3CA |
What I'm trying to reduce the number of rows based on TCGA_Name. I want to Full_Name cancer types if they have the same TCGA heading and share their gene.name. The final product should look like this:
TCGA_Name | Full_Name | Gene.Name |
---|---|---|
Thyroid Carcinoma | Papillary Thyroid Cancer, Thyroid Gland Carcinoma | NRAS |
Sarcoma | Uterine leiomyosarcoma, Sarcoma | PIK3CA |
Ovarian Serous Cystadenocarcinoma | High Grade Serous Ovarian Cancer | PIK3CA |
so far I've managed this:
library(plyr)
df1 <- ddply(df1, .(TCGA_Name), summarize, text=paste(Hotspot_Name, collapse=", "))```
but this deletes the Gene.Name
column
as always, any help is really appreciated!