0
votes

I have a data frame which looks something like this:

TCGA_Name Full_Name Gene.Name
Thyroid Carcinoma Papillary Thyroid Cancer NRAS
Thyroid Carcinoma Thyroid Gland Carcinoma NRAS
Sarcoma Uterine leiomyosarcoma PIK3CA
Sarcoma Sarcoma PIK3CA
Ovarian Serous Cystadenocarcinoma High Grade Serous Ovarian Cancer PIK3CA

What I'm trying to reduce the number of rows based on TCGA_Name. I want to Full_Name cancer types if they have the same TCGA heading and share their gene.name. The final product should look like this:

TCGA_Name Full_Name Gene.Name
Thyroid Carcinoma Papillary Thyroid Cancer, Thyroid Gland Carcinoma NRAS
Sarcoma Uterine leiomyosarcoma, Sarcoma PIK3CA
Ovarian Serous Cystadenocarcinoma High Grade Serous Ovarian Cancer PIK3CA

so far I've managed this:

library(plyr) 
df1 <- ddply(df1, .(TCGA_Name), summarize, text=paste(Hotspot_Name, collapse=", "))```

but this deletes the Gene.Name column

as always, any help is really appreciated!

1

1 Answers

1
votes

Is it what you want?

df1 <- ddply(df1, .(TCGA_Name,Gene.Name), summarize, text=paste(Full_Name, collapse=", "))

Just add 'Gene.Name'