1
votes

I like to know how I can use dplyr mutate function when I don't know column names. Here is my example code;

library(dplyr)
w<-c(2,3,4)
x<-c(1,2,7)
y<-c(1,5,4)
z<-c(3,2,6)
df <- data.frame(w,x,y,z)
df %>% rowwise() %>% mutate(minimum = min(x,y,z))

Source: local data frame [3 x 5]
Groups: <by row>

# A tibble: 3 x 5
      w     x     y     z     minimum
    <dbl> <dbl> <dbl> <dbl>   <dbl>
1     2     1     1     3       1 
2     3     2     5     2       2 
3     4     7     4     6       4

This code is finding minimum value in row-wise. Yes, "df %>% rowwise() %>% mutate(minimum = min(x,y,z))" works because I typed column names, x, y, z. But, let's assume that I have a really big data.frame with several hundred columns, and I don't know all of the column names. Or, I have multiple data sets of data.frame, and they have all different column names; I just want to find a minimum value from 10th column to 20th column in each row and in each data.frame.

In this example data.frame I provided above, let's assume that I don't know column names, but I just want to get minimum value from 2nd column to 4th column in each row. Of course, this doesn't work, because 'mutate' doesn't work with vector;

df %>% rowwise() %>% mutate(minimum=min(df[,2],df[,3], df[,4]))  

Source: local data frame [3 x 5]
Groups: <by row>

# A tibble: 3 x 5
       w     x     y     z    minimum
     <dbl> <dbl> <dbl> <dbl>   <dbl>
 1     2     1     1     3       1
 2     3     2     5     2       1
 3     4     7     4     6       1

These two codes below also don't work.

 df %>% rowwise() %>% mutate(average=min(colnames(df)[2], colnames(df)[3], colnames(df)[4]))  
 df %>% rowwise() %>% mutate(average=min(noquote(colnames(df)[2]), noquote(colnames(df)[3]), noquote(colnames(df)[4])))  

I know that I can get minimum value by using apply or different method when I don't know column names. But, I like to know whether dplyr mutate function can be able to do that without known column names.

Thank you,

3
You may want some sort of tidyeval approach. Like mutate(minimum = min(!!!syms(names(df)[2:4]))). If you decide to go with tidyeval, see a roundup of some tidyeval resources here.aosmith
Thank you aosmith! I am going to study about tidyeval.Sanghoon Lee

3 Answers

2
votes

With apply:

library(dplyr)
library(purrr)

df %>%
  mutate(minimum = apply(df[,2:4], 1, min))

or with pmap:

df %>%
  mutate(minimum = pmap(.[2:4], min))

Also with by_row from purrrlyr:

df %>%
  purrrlyr::by_row(~min(.[2:4]), .collate = "rows", .to = "minimum")

Output:

# tibble [3 x 5]
      w     x     y     z minimum
  <dbl> <dbl> <dbl> <dbl>   <dbl>
1     2     1     1     3       1
2     3     2     5     2       2
3     4     7     4     6       4
1
votes

A vectorized option would be pmin. Convert the column names to symbols with syms and evaluate (!!!) to return the values of the columns on which pmin is applied

library(dplyr)
df %>% 
  mutate(minimum = pmin(!!! rlang::syms(names(.)[2:4])))
#  w x y z minimum
#1 2 1 1 3       1
#2 3 2 5 2       2
#3 4 7 4 6       4
0
votes

Here is a tidyeval approach along the lines of the suggestion from aosmith. If you don't know the column names, you can make a function that accepts the desired positions as inputs and finds the columns names itself. Here, rlang::syms() takes the column names as strings and turns them into symbols, !!! unquotes and splices the symbols into the function.

library(dplyr)
w<-c(2,3,4)
x<-c(1,2,7)
y<-c(1,5,4)
z<-c(3,2,6)
df <- data.frame(w,x,y,z)

rowwise_min <- function(df, min_cols){
  cols <- df[, min_cols] %>% colnames %>% rlang::syms()
  df %>%
    rowwise %>%
    mutate(minimum = min(!!!cols))
}

rowwise_min(df, 2:4)
#> Source: local data frame [3 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 3 x 5
#>       w     x     y     z minimum
#>   <dbl> <dbl> <dbl> <dbl>   <dbl>
#> 1     2     1     1     3       1
#> 2     3     2     5     2       2
#> 3     4     7     4     6       4
rowwise_min(df, c(1, 3))
#> Source: local data frame [3 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 3 x 5
#>       w     x     y     z minimum
#>   <dbl> <dbl> <dbl> <dbl>   <dbl>
#> 1     2     1     1     3       1
#> 2     3     2     5     2       3
#> 3     4     7     4     6       4

Created on 2018-09-04 by the reprex package (v0.2.0).