I often want to map over a vector of column names in a data frame, and keep track of the output using the .id
argument. But to write the column names related to each map
iteration into that .id
column seems to require doubling up their name in the input vector - in other words, by naming each column name with its own name. If I don't name the column with its own name, then .id
just stores the index of the iteration.
This is expected behavior, per the purrr::map docs:
.id
Either a string or NULL. If a string, the output will contain a variable with that name, storing either the name (if .x is named) or the index (if .x is unnamed) of the input.
But my approach feels a little clunky, so I imagine I'm missing something. Is there a better way to get a list of the columns I'm iterating over, that doesn't require writing each column name twice in the input vector? Any suggestions would be much appreciated!
Here's an example to work with:
library(rlang)
library(tidyverse)
tb <- tibble(foo = rnorm(10), bar = rnorm(10))
cols_once <- c("foo", "bar")
cols_once %>% map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")
# A tibble: 2 x 2
var avg <-- var stores only the iteration index
<chr> <dbl>
1 1 -0.0519
2 2 0.204
cols_twice <- c("foo" = "foo", "bar" = "bar")
cols_twice %>% map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")
# A tibble: 2 x 2
var avg <-- var stores the column names
<chr> <dbl>
1 foo -0.0519
2 bar 0.204
t
as a variable name; I hope you don't intend on transposing anything... – Artem Sokolovt()
. will update! – andrew_reece