1
votes

So I have a data frame , simplified as this:

ID A B C
1  1 5 0 
2  3 0 3
3  0 2 1
2  5 9 1 
3  3 5 3
1  2 6 4

Simply put, I want to calculate the following for each row:

  • Mean
  • Median
  • Max
  • Min

Easy enough, but the hard part for me is after taking each, how do I create a mean value to represent each ID.

So after I get these values, how do I show the AVERAGE MEAN/MED/MAX/MIN for each ID???

Expected output:

(1)

ID  Mean Median Min Max
1      2      1   0   5
2      2      3   0   3
3      1      1   0   2
2      5      5   1   9
3   3.66      3   3   5
1      4      4   2   6   

(2)

ID  AvgMean AvgMedian AvgMin AvgMax
1         3       2.5      1    5.5  
2       3.5         4      1      6 
3      2.33         3      3    3.5
3
The way you describe this problem, it sounds like you are trying to take the mean after taking the mean for each row... which is sort of a weird statistic?hachiko

3 Answers

2
votes

You can try something like this:

   library(dplyr)
   df %>% 
   group_by(ID) %>%
   summarise(mean_ = mean(c_across(A:C), na.rm = T),
             medi_ = median(c_across(A:C), na.rm = T),
             max_  = max(c_across(A:C), na.rm = T),
             min_  = min(c_across(A:C), na.rm = T))
    
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 3 x 5
         ID mean_ medi_  max_  min_
      <int> <dbl> <dbl> <int> <int>
    1     1  3      3       6     0
    2     2  3.5    3       9     0
    3     3  2.33   2.5     5     0

For the second part:

df %>% 
   rowwise() %>%
   summarise(mean_ = mean(c_across(A:C), na.rm = T),
             medi_ = median(c_across(A:C), na.rm = T),
             max_  = max(c_across(A:C), na.rm = T),
             min_  = min(c_across(A:C), na.rm = T))

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
  mean_ medi_  max_  min_
  <dbl> <int> <int> <int>
1  2        1     5     0
2  2        3     3     0
3  1        1     2     0
4  5        5     9     1
5  3.67     3     5     3
6  4        4     6     2

With data:

df <- structure(list(ID = c(1L, 2L, 3L, 2L, 3L, 1L), A = c(1L, 3L, 
0L, 5L, 3L, 2L), B = c(5L, 0L, 2L, 9L, 5L, 6L), C = c(0L, 3L, 
1L, 1L, 3L, 4L)), class = "data.frame", row.names = c(NA, -6L
)) 
0
votes

thanks for posting expected output. I would consider using summarize and across together

library(dplyr)

df <- df %>%
group_by(ID)
summarize(across(2:4, mean))
0
votes

In base R, the following seems to do what the question asks for.

out is a data.frame of the statistics ungrouped and out2 of the grouped ones.

fun <- function(X){
  f <- function(x, na.rm = FALSE){
    c(
      Mean = mean(x, na.rm = na.rm),
      Median = median(x, na.rm = na.rm),
      Min = min(x, na.rm = na.rm),
      Max = max(x, na.rm = na.rm)
    )
  }
  t(apply(X, 1, f))
}

out <- lapply(split(df1[-1], df1$ID), fun)
out2 <- lapply(out, colMeans)

out <- do.call(rbind, out)
out <- cbind.data.frame(ID = row.names(out), out)
out2 <- cbind.data.frame(ID = names(out2), do.call(rbind, out2))

out
#  ID     Mean Median Min Max
#1  1 2.000000      1   0   5
#6  6 4.000000      4   2   6
#2  2 2.000000      3   0   3
#4  4 5.000000      5   1   9
#3  3 1.000000      1   0   2
#5  5 3.666667      3   3   5


out2
#  ID     Mean Median Min Max
#1  1 3.000000    2.5 1.0 5.5
#2  2 3.500000    4.0 0.5 6.0
#3  3 2.333333    2.0 1.5 3.5