map over columns and apply custom function

Question

Missing something small here and struggling to pass columns to function. I just want to map (or lapply) over columns and perform a custom function on each of the columns. Minimal example here:

library(tidyverse)
set.seed(10)
df <- data.frame(id = c(1,1,1,2,3,3,3,3),
                    r_r1 = sample(c(0,1), 8, replace =  T),
                    r_r2 = sample(c(0,1), 8, replace =  T),
                    r_r3 = sample(c(0,1), 8, replace =  T))
df
#   id r_r1 r_r2 r_r3
# 1  1    0    0    1
# 2  1    0    0    1
# 3  1    1    0    1
# 4  2    1    1    0
# 5  3    1    0    0
# 6  3    0    0    1
# 7  3    1    1    1
# 8  3    1    0    0

a function just to filter and counts unique ids remaining in the dataset:

cnt_un <-  function(var) {
  df %>% 
    filter({{var}} == 1) %>% 
    group_by({{var}}) %>% 
    summarise(n_uniq = n_distinct(id)) %>% 
    ungroup()
}

it works outside of map

cnt_un(r_r1)
# A tibble: 1 x 2
   r_r1 n_uniq
  <dbl>  <int>
1     1      3

I want to apply the function over all r_r columns to get something like:

df2
#      y n_uniq
# 1 r_r1      3
# 2 r_r2      2
# 3 r_r3      2

I thought the following would work but doesnt

map(dplyr::select(df, matches("r_r")), ~ cnt_un(.x))

any suggestions? thanks

camille camille · Accepted Answer · 2020-01-14T14:56:24

I'm not sure if there's a direct tidyeval way to do this with something like map. The issue you're running into is that in calling map(df, *whatever_function*), the function is being called on each column of df as a vector, whereas your function expects a bare column name in the tidyeval style. To verify that:

map(df, class)

will return "numeric" for each column.

An alternative is to iterate over column names as strings, and convert those to symbols; this takes just one additional line in the function.

library(dplyr)
library(tidyr)
library(purrr)

cnt_un_name <- function(varname) {
  var <- ensym(varname)
  df %>% 
    filter({{var}} == 1) %>% 
    group_by({{var}}) %>% 
    summarise(n_uniq = n_distinct(id)) %>% 
    ungroup()
}

Calling the function is a little awkward because it keeps only the relevant column names (calling on "r_r1" gets columns "r_r1" and "n_uniq", etc). One way is to get the vector of column names you want, name it so you can add an ID column in map_dfr, and drop the extra columns, since they'll be mostly NA.

grep("^r_r\\d+", names(df), value = TRUE) %>%
  set_names() %>%
  map_dfr(cnt_un_name, .id = "y") %>%
  select(y, n_uniq)
#> # A tibble: 3 x 2
#>   y     n_uniq
#>   <chr>  <int>
#> 1 r_r1       3
#> 2 r_r2       2
#> 3 r_r3       2

A better way is to call the function, then bind after reshaping.

grep("^r_r\\d+", names(df), value = TRUE) %>%
  map(cnt_un_name) %>%
  map_dfr(pivot_longer, 1, names_to = "y") %>%
  select(y, n_uniq)
# same output as above

Alternatively (and maybe better/more scaleable) would be to do the column renaming inside the function definition.

map over columns and apply custom function

3 Answers