0
votes

Nesting and mapping with pipes seem to be an extremely viable workflow in the philosophy for driven analysis. On the downside, it takes a bit of headbanging to get the hold of the syntax...

Inspired by the idea, when I came through this Coding in R: Nest and map your way to efficient code. All good, but was wondering if it's possible to streamline the workflow, in short combine the:

  1. Nesting isn’t scary and your data hasn’t disappeared and
  2. Map your nest,

into one line, instead of two steps.

For reproducibility, we can take another SO question: Using nest and purrr::map outside of mutate, it's possible to drop the cyl column easily, but instead, if I want to

  1. select some specific column say, mpg, disp and vs for 4 cylinder and
  2. only mpg, disp for 8 cylinder and
  3. drop/ unmodify everything related to 6 cylinders and
  4. fit a lm() model with the selected varibles using map() family of functions and
  5. save the models using something like walk().
library(tidyverse)
mtcars %>%
  split(.$cyl) %>%
  map(~ .x %>% select(-cyl)) %>%
  walk2(names(.), ~write_csv(.x, paste0(.y, '.csv')))

That worked as it should, but when I try to apply the aprroach with and even without trying the goals 1-3, it throws error:

mtcars %>% group_by(cyl) %>% nest() %>% map(.$data, lm(.$mpg ~ .$disp + .$vs, .data))

Error: Index 1 must have length 1, not 10 Run rlang::last_error() to see where the error occurred. Will be great if the solution uses the newly introduced across() with dplyr 1.0.0.

2

2 Answers

1
votes

You could try something like this. I hope this can help (Not sure what you want in point 3 but I included an approach):

data("mtcars")
#Create list
List <- split(mtcars,mtcars$cyl)
#Create function
models <- function(x)
{
  cyl <- unique(x$cyl)
  if(cyl==4)
  {
    mymodel <- lm(mpg ~ disp+vs, data=x)
  } else if(cyl==8)
  {
    mymodel <- lm(mpg ~ disp, data=x)
  } else
  {
    mymodel <- lm(mpg ~ 1, data=x)
  }
  #Dataframe
  dfmymodel <- cbind(data.frame(Group=cyl,model=as.character(mymodel$call)[2]),as.data.frame(t(mymodel$coefficients)))
  return(dfmymodel)
}
#Apply function
List2 <- lapply(List, models)
#Final output
DF <- do.call(plyr::rbind.fill,List2)

  Group           model (Intercept)        disp        vs
1     4 mpg ~ disp + vs    42.65658 -0.13845873 -1.579492
2     6         mpg ~ 1    19.74286          NA        NA
3     8      mpg ~ disp    22.03280 -0.01963409        NA
0
votes

Here is another approach using purrr similar to these examples.

library(tidyverse)

mtcars %>% 
  group_by(cyl) %>% 
  nest() %>% 
  mutate(model = case_when(
    cyl == 4 ~ map(data, function(df) lm(mpg ~ disp + vs, data = df)),
    cyl == 8 ~ map(data, function(df) lm(mpg ~ disp , data = df)),
    TRUE     ~ map(data, function(df) lm(mpg ~ 1 , data = df))
    ),
    model_tidy = map(model, broom::tidy)) %>%
  select(cyl, model_tidy) %>%
  unnest


#-------
# A tibble: 6 x 6
    cyl term        estimate std.error statistic      p.value
  <dbl> <chr>          <dbl>     <dbl>     <dbl>        <dbl>
1     6 (Intercept)  19.7      0.549      35.9   0.0000000310
2     4 (Intercept)  42.7      5.16        8.26  0.0000346   
3     4 disp         -0.138    0.0353     -3.93  0.00438     
4     4 vs           -1.58     3.14       -0.503 0.629       
5     8 (Intercept)  22.0      3.35        6.59  0.0000259   
6     8 disp         -0.0196   0.00932    -2.11  0.0568