Automating, dataframe %>% nest %>% purrr:map / walk tidyverse workflow with conditional selection of columns

Question

Nesting and mapping with pipes seem to be an extremely viable workflow in the tidyverse philosophy for data.frame driven analysis. On the downside, it takes a bit of headbanging to get the hold of the syntax...

Inspired by the idea, when I came through this Coding in R: Nest and map your way to efficient code. All good, but was wondering if it's possible to streamline the workflow, in short combine the:

Nesting isn’t scary and your data hasn’t disappeared and
Map your nest,

into one line, instead of two steps.

For reproducibility, we can take another SO question: Using nest and purrr::map outside of mutate, it's possible to drop the cyl column easily, but instead, if I want to

select some specific column say, mpg, disp and vs for 4 cylinder and
only mpg, disp for 8 cylinder and
drop/ unmodify everything related to 6 cylinders and
fit a lm() model with the selected varibles using map() family of functions and
save the models using something like walk().

library(tidyverse)
mtcars %>%
  split(.$cyl) %>%
  map(~ .x %>% select(-cyl)) %>%
  walk2(names(.), ~write_csv(.x, paste0(.y, '.csv')))

That worked as it should, but when I try to apply the aprroach with nest and map even without trying the goals 1-3, it throws error:

mtcars %>% group_by(cyl) %>% nest() %>% map(.$data, lm(.$mpg ~ .$disp + .$vs, .data))

Error: Index 1 must have length 1, not 10 Run rlang::last_error() to see where the error occurred. Will be great if the solution uses the newly introduced across() with dplyr 1.0.0.

Duck Duck · Accepted Answer · 2020-06-23T15:20:20

You could try something like this. I hope this can help (Not sure what you want in point 3 but I included an approach):

data("mtcars")
#Create list
List <- split(mtcars,mtcars$cyl)
#Create function
models <- function(x)
{
  cyl <- unique(x$cyl)
  if(cyl==4)
  {
    mymodel <- lm(mpg ~ disp+vs, data=x)
  } else if(cyl==8)
  {
    mymodel <- lm(mpg ~ disp, data=x)
  } else
  {
    mymodel <- lm(mpg ~ 1, data=x)
  }
  #Dataframe
  dfmymodel <- cbind(data.frame(Group=cyl,model=as.character(mymodel$call)[2]),as.data.frame(t(mymodel$coefficients)))
  return(dfmymodel)
}
#Apply function
List2 <- lapply(List, models)
#Final output
DF <- do.call(plyr::rbind.fill,List2)

  Group           model (Intercept)        disp        vs
1     4 mpg ~ disp + vs    42.65658 -0.13845873 -1.579492
2     6         mpg ~ 1    19.74286          NA        NA
3     8      mpg ~ disp    22.03280 -0.01963409        NA

Automating, dataframe %>% nest %>% purrr:map / walk tidyverse workflow with conditional selection of columns

2 Answers