3
votes

I have a dataframe:

gene=c("Esr", "Esr", "Esr", "Nop", "Nop", "Nop", "Stu", "Mkp", "Mkp", "P53", "Ard", "Ard")
int_1=c(34,56,544,566,123,00,343,56,22,10,11,19)
int_2=c(24,26,58,56,13,00,34,6,22,10,119,109)
int_3=c(14,36,54,566,12,00,43,56,00,770,11,119)
df1 = cbind.data.frame(gene, int_1, int_2, int_3)
  1. df1 is 26000 rows long and 36 columns wide.
  2. I want to make a new df2, where column "gene" is looked for unique strings/text and all values in the rows are summed together for corresponding individual intensity columns.
  3. In df1 the gene names appear multiple times. The df2 will have each gene only once.

I am trying to use tidyverse packages so a solution using those will be very much appreciated (if possible). Thank you so much.

1

1 Answers

3
votes

We can use dplyr::summarise_all

(1) to average values

library(tidyverse)
df2 <- df1 %>%
    group_by(gene) %>%
    summarise_all(mean)
df2;
## A tibble: 6 x 4
#  gene  int_1 int_2 int_3
#  <fct> <dbl> <dbl> <dbl>
#1 Ard    15.0  114.  65.0
#2 Esr   211.    36.  34.7
#3 Mkp    39.0   14.  28.0
#4 Nop   230.    23. 193.
#5 P53    10.0   10. 770.
#6 Stu   343.    34.  43.0

(2) to sum values

df2 <- df1 %>%
    group_by(gene) %>%
    summarise_all(sum)
df2;
## A tibble: 6 x 4
#  gene  int_1 int_2 int_3
#  <fct> <dbl> <dbl> <dbl>
#1 Ard     30.  228.  130.
#2 Esr    634.  108.  104.
#3 Mkp     78.   28.   56.
#4 Nop    689.   69.  578.
#5 P53     10.   10.  770.
#6 Stu    343.   34.   43.

Or in base R you can use aggregate

aggregate(cbind(int_1, int_2, int_3) ~ gene, data = df1, sum)
#  gene int_1 int_2 int_3
#1  Ard    30   228   130
#2  Esr   634   108   104
#3  Mkp    78    28    56
#4  Nop   689    69   578
#5  P53    10    10   770
#6  Stu   343    34    43