running multiple regression models using tidymodels

Question

I've recently been using tidymodels to run models and select parameters that best satisfy some objective function. For example using a hypothetical regression on mtcars data (using the regression examples from the bottom answer of this question as an example)

library(tidymodels)
library(tidyverse)

#some regression model
cars_recipe <- recipe(mpg ~ disp + drat, data = mtcars)

wf <- workflow() %>%
  add_recipe(cars_recipe)

(roughly using syntax from this blog post for comparison; I'm not doing various steps like splitting test/train just for clarity in this example)

I can then run many models and get the metrics from those models (in this case for various penalties for some elastic nets) thusly

#run over a parameter space and find metrics as an objective
mtcars_bootstrap <- bootstraps(mtcars)

tune_spec <- linear_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

lambda_grid <- grid_regular(penalty(), levels = 50)

lasso_grid <- tune_grid(
  wf %>% add_model(tune_spec),
  resamples = mtcars_bootstrap,
  grid = lambda_grid
)

but lets say I have good reason to think there are two separate models which may best capture the effect on (e.g.) mpg of a car so I create a second model as a recipe

cars_recipe2 <- recipe(mpg ~ I(disp + drat), data = mtcars)

now I could just also run this recipe through the above pipeline using lapply or the purrr family of functions, however, I wondered if there is some built-in way to run multiple recipes through tidymodels?

It seems like there should be though I also thought it might be precluded by design to prevent p-hacking

Julia Silge Julia Silge · Accepted Answer · 2021-02-09T21:57:36

There is an experimental package we are developing to do just this called workflowsets. You can install it from GitHub for now, if you are up for trying out a new, still-developing package:

devtools::install_github("tidymodels/workflowsets")

Then you can set up an analysis like this:

library(tidymodels)
library(workflowsets)

mtcars_boot <- bootstraps(mtcars)

rec1 <- recipe(mpg ~ disp + drat, data = mtcars)
rec2 <- recipe(mpg ~ disp + drat, data = mtcars) %>%
  step_log(disp) %>%
  step_normalize(disp, drat)

lasso_spec <- linear_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

# put it all together in a "workflow set"
car_models <- 
  workflow_set(
    preproc = list(simple = rec1, preproc = rec2),
    models = list(lasso = lasso_spec),
    cross = TRUE
  )
car_models
#> # A workflow set/tibble: 2 x 4
#>   wflow_id      info             option    result    
#>   <chr>         <list>           <list>    <list>    
#> 1 simple_lasso  <tibble [1 × 4]> <opts[0]> <list [0]>
#> 2 preproc_lasso <tibble [1 × 4]> <opts[0]> <list [0]>

Now that you have a workflow set, you can "map" over it using, in this case tune_grid() and the other arguments you want to use like the resamples and grid.

lambda_grid <- grid_regular(penalty(range = c(-2, 0)), levels = 10)

car_res <- car_models %>%
  workflow_map("tune_grid", resamples = mtcars_boot, 
               grid = lambda_grid, verbose = TRUE)
#> i 1 of 2 tuning:     simple_lasso
#> ✓ 1 of 2 tuning:     simple_lasso (7.7s)
#> i 2 of 2 tuning:     preproc_lasso
#> ✓ 2 of 2 tuning:     preproc_lasso (8.4s)

## some autoplot methods are available
autoplot(car_res)

^{Created on 2021-02-09 by the reprex package (v1.0.0)}

This is in process so if you have high stability needs, I would wait a few months to use it. We are excited about how it will be able to fit people's needs, though!

running multiple regression models using tidymodels

1 Answers