1
votes

I have a dataframe that looks like so:

ID | One   |  Two   | Three

A    0.6      0.4     0.2
B    0.2      0.32    0.12
C    0.1      0.24    0.14
D    0.62     0.12    0.19

What I would like to do is create three new fields based on the avg value per ID, the min value per ID, and then a final column that calls the associated column header (name) associated to that min value.

Output will look something like this:

ID | One   |  Two   | Three  | Avg  |  Min   | Min Header

A    0.6      0.4     0.2      0.4    0.2     Three
B    0.2      0.32    0.12     0.21   0.12    Three
C    0.1      0.24    0.14     0.16   0.1     One
D    0.62     0.12    0.19     0.31   0.12    Two

I am currently using a group_by(ID) %>% summarise(avg = col1+col2+col3/3, min = pmin(col1,col2,col3) to create new dataframe but idk how to pull the column header as a new col in my group_by '%>%' method.

Any help would be greatly apprecaited!

3

3 Answers

3
votes

Here is an option with dplyr where we get the 'MinHeader' based on the max.col index for each row after changing the numeric values to negative, then we use rowMeans and pmin to get the mean and min per row

library(dplyr)
library(purrr)
df1 %>%
  mutate(MinHeader = names(.)[-1][max.col(-.[-1])], 
          Avg = rowMeans(.[2:4], na.rm = TRUE), 
          Min = invoke(pmin, .[2:4])) 

-output

# ID  One  Two Three MinHeader       Avg  Min
#1  A 0.60 0.40  0.20     Three 0.4000000 0.20
#2  B 0.20 0.32  0.12     Three 0.2133333 0.12
#3  C 0.10 0.24  0.14       One 0.1600000 0.10
#4  D 0.62 0.12  0.19       Two 0.3100000 0.12

data

df1 <- structure(list(ID = c("A", "B", "C", "D"), One = c(0.6, 0.2, 
0.1, 0.62), Two = c(0.4, 0.32, 0.24, 0.12), Three = c(0.2, 0.12, 
0.14, 0.19)), class = "data.frame", row.names = c(NA, -4L))
2
votes

If you are on dplyr 1.0.0 or above you can use rowwise with c_across :

library(dplyr)

df %>%
  rowwise() %>%
  mutate(Avg = mean(c_across(One:Three), na.rm = TRUE), 
         Min = min(c_across(One:Three), na.rm = TRUE), 
         Min_header = names(.)[-1][which.min(c_across(One:Three))])

#  ID      One   Two Three   Avg   Min Min_header
#  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>     
#1 A      0.6   0.4   0.2  0.4    0.2  Three     
#2 B      0.2   0.32  0.12 0.213  0.12 Three     
#3 C      0.1   0.24  0.14 0.16   0.1  One       
#4 D      0.62  0.12  0.19 0.31   0.12 Two       
2
votes

Here is another dplyr approach

library(dplyr)
df %>%
  mutate(
    mat = as.matrix(across(One:Three)), # create a temporary matrix that only contains columns One to Three
    mincol = max.col(-mat),
    Avg = rowMeans(mat, na.rm = TRUE), 
    Min = mat[cbind(1:n(), mincol)],
    MinHeader = colnames(mat)[mincol],
    mat = NULL, mincol = NULL
  )

Output

  ID  One  Two Three       Avg  Min MinHeader
1  A 0.60 0.40  0.20 0.4000000 0.20     Three
2  B 0.20 0.32  0.12 0.2133333 0.12     Three
3  C 0.10 0.24  0.14 0.1600000 0.10       One
4  D 0.62 0.12  0.19 0.3100000 0.12       Two