minimum (or maximum) value of each row across multiple columns

3

votes

I am looking for a solution for min(or max) value for each row of columns. Like:

# my data.frame is df:

library(tibble)
df <- tribble(
~name, ~type_1, ~type_2, ~type_3,
"a",   1,   5, 2,
"b",   2,   2, 6,
"c",   3,   8, 2
)

# and output should be result_df:

result_df <- tribble(
~name, ~type_1, ~type_2, ~type_3, ~min_val, ~min_col,
"a",   1,          5,     2,          1, "type_1",
"b",   8,          2,     6,          2, "type_2",
"c",   3,          8,     0,          0 ,"type_3"
)

I tried rowwise and pmax function but it did not work. I can use gather and grouping but I want to know is there column/row-wise solution.

This approach will be also useful for mean, median functions.

Thanks for your help.

r dplyr

2

votes

Is there something wrong with a base R approach?

# find the columns in question
mask <- colnames(df)[startsWith(colnames(df), 'type_')]

# apply row-wise and transpose afterwards
df[c('min_val', 'min_col')] <- t(apply(df[mask], 1, function(x) {
  m <- which.min(x)
  (y <- c(x[m], mask[m]))
}))

This yields

# A tibble: 3 x 6
  name  type_1 type_2 type_3 min_val min_col
  <chr>  <dbl>  <dbl>  <dbl> <chr>   <chr>  
1 a         1.     5.     2. 1       type_1 
2 b         2.     2.     6. 2       type_1 
3 c         3.     8.     2. 2       type_3

Note that which.min() takes the first found match (there are two 2 in the second row).

2

votes

A fairly generalizable approach is to reshape to temporarily reshape to long form, which makes the calculations easier—an ordinary grouped mutate.

library(tidyr)
library(dplyr)

df <- tribble(
    ~name, ~type_1, ~type_2, ~type_3,
    "a",   1,   5, 2,
    "b",   8,   2, 6,
    "c",   3,   8, 2
)

df %>% 
    gather(type, type_val, contains('type')) %>% 
    group_by(name) %>% 
    mutate(min_val = min(type_val), 
           min_col = type[type_val == min_val]) %>% 
    spread(type, type_val)
#> # A tibble: 3 x 6
#> # Groups:   name [3]
#>   name  min_val min_col type_1 type_2 type_3
#>   <chr>   <dbl> <chr>    <dbl>  <dbl>  <dbl>
#> 1 a           1 type_1       1      5      2
#> 2 b           2 type_2       8      2      6
#> 3 c           2 type_3       3      8      2

In practice, it may be preferable to leave the data in long form by dropping the spread call.

Caveats:

If more than one value can be equal to the min (or max or median or whatever), type_val == min_val will have two true values, and will thus have to be further summarized to reduce it to a single number, e.g. how which.min returns the first minimum.
At scale, reshaping may be expensive, so more convoluted but optimized approaches (e.g. leveraging max.col) may be preferable.

1

votes

Can you provide a little detail on the logic behind result_df?
Maybe it is possible to share your gather & grouping code?

Came up with the following intermediary result:

df$min_val = apply(df[2:4], 1, min) 
df$min_col = names(df[2:4])[apply( df[2:4], 1, which.min)]

0

votes

I may have missed something; and you may want a purely dplyr type response... but here's an approach:

I recreated data because I was uncertain why your result_df and df had differing values

df <- data.frame(name = letters[1:15], as.data.frame(
  lapply(1:3, function(i){
    sample(1:10, 15, T)
  })) %>% setNames(sprintf("type_%s", 1:ncol(.))
))

Then loop through/ apply rowwise so-to-speak and rebind

result_df <- lapply(1:nrow(df), function(i){
  check_df <- df[i,] %>% select(matches("type"))
  r <- check_df[which.min(as.numeric(check_df))]
  data.frame(df[i,], min_val = as.numeric(r), min_col = names(r))
}) %>% rbind_pages()


> df
>    name type_1 type_2 type_3
1     a      9      9      8
2     b      9      7      6
3     c      4      5      5
4     d      7      4      4
5     e      6      5      9
6     f      2      9      7
7     g      9     10      4
8     h      3      5      1
9     i      9      5      5
10    j      1      1      9
11    k      9      5      2
12    l      2      3      4
13    m      4      2      3
14    n      1      3      7
15    o      2      7      6

> result_df
   name type_1 type_2 type_3 min_val min_col
1     a      9      9      8       8  type_3
2     b      9      7      6       6  type_3
3     c      4      5      5       4  type_1
4     d      7      4      4       4  type_2
5     e      6      5      9       5  type_2
6     f      2      9      7       2  type_1
7     g      9     10      4       4  type_3
8     h      3      5      1       1  type_3
9     i      9      5      5       5  type_2
10    j      1      1      9       1  type_1
11    k      9      5      2       2  type_3
12    l      2      3      4       2  type_1
13    m      4      2      3       2  type_2
14    n      1      3      7       1  type_1
15    o      2      7      6       2  type_1

minimum (or maximum) value of each row across multiple columns

4 Answers