1
votes

I am using R studio but I am an almost complete newbie so excuse my possibly fundamental question

I have a large df with raw names (A,B,C) and column names containing names of samples in duplicates. It looks like this:

   MNSA.1 MNSA.2 NSDW.1 NSDW.2 NAR.1 NAR.2
A    7      8     9     5      3      2
B    7      4     3     8      8      5
C    3      3     4     7      8      3

I do not have any NAs or missing values (although I do have 0s), but from this df, I need to create another dataframe containing the average between the duplicates (".1" and ".2") . It would look like this:

   MNSA  NSDW  NAR
A   7.5   7    2.5
B   5.5   5.5  6.5
C    3    5.5  5.5

So each value in the resulting df (ex. "7") is an average between its corresponding .1 and .2 columns (NSDW.1, row A: "9" and NSDW.2 row A: "5") for each row in the original df.

Is this even possible? Please ask if you need clarification. Any help will be most appreciated.

Thank you!

1

1 Answers

0
votes

We can split by the column names (after removing the suffix part) into a list and get the rowMeans

sapply(split.default(df1, sub("\\.\\d+", "", names(df1))), rowMeans)
#   MNSA NAR NSDW
#A  7.5 2.5  7.0
#B  5.5 6.5  5.5
#C  3.0 5.5  5.5

Or we can use dplyr/tidyr methods

library(dplyr)
library(tidyr)
tibble::rownames_to_column(df1, var = "rn") %>% 
     gather(Var, Val, -rn) %>%
     separate(Var, into = c("Var1", "Var2")) %>% 
     group_by(rn, Var1) %>% 
     summarise(Val = mean(Val)) %>%
     spread(Var1, Val)