0
votes

My goal is to fit a Poisson glmnet using the tidymodels package. For this purpose, I use the recipes package to preprocess the data, parsnip to fit the model, workflows to bundle the model with the preprocessor and poissonreg to be able to use Poisson regression with parsnip. It works perfectly fine if my training dataset only contains numeric predictors, but I'm not able to fit the model when there are some factor (or categorical) predictors. In the code below, you may think that using tidymodels is overkill. Yes it is for this minimal example, but eventually, I will want to tune my hyperparameters, validate my models, etc. and then, tidymodels will be useful.

First, let's load the packages we need.

library(tibble)
library(recipes)
library(poissonreg)
library(parsnip)
library(workflows)
library(glmnet)

Let's also simulate our dataset having 1000 rows, 1 outcome (y), 1 categorical predictor with 2 levels (x_fac) and 3 numeric predictors (x_num_01, x_num_02 and x_num_03).

n <- 1000

dat <- tibble::tibble(
  y = rpois(n, lambda = 0.15),
  x_fac = factor(sample(c("M", "F"), size = n, replace = T)),
  x_num_01 = rnorm(n),
  x_num_02 = rnorm(n),
  x_num_03 = rnorm(n)
)

Then, we define and prepare the recipe. The preprocessing is very simple: all categorical predictors are transformed to dummy predictors if there are any.

rec <- 
  recipes::recipe(y ~ ., data = dat) %>% 
  recipes::step_dummy(all_nominal()) %>% 
  recipes::prep()

Then we define our model,

glmnet_mod <- 
  poissonreg::poisson_reg(penalty = 0.01, mixture = 1) %>% 
  parsnip::set_engine("glmnet")

bundle the model and the preprocessor together with the workflows package

glmnet_wf <- 
  workflows::workflow() %>%
  workflows::add_recipe(rec) %>% 
  workflows::add_model(glmnet_mod)

and finally, we train the model with parsnip:

glmnet_fit <- 
  glmnet_wf %>% 
  parsnip::fit(data = dat)

This parsnip::fit function throws the error

Error in fishnet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  : 
  NA/NaN/Inf in foreign function call (arg 4)
In addition: Warning message:
In fishnet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  :
  NAs introduced by coercion
Timing stopped at: 0.005 0 0.006

and I have absolutely no idea why! If you remove the predictor x_fac from the simulated dataset dat, it works fine. It also works if I preprocess the data by myself before running a glmnet with the glmnet package:

x <- dat %>% dplyr::mutate(x_fac_M = x_fac == "M") %>% dplyr::select(contains("x"), -x_fac) %>% as.matrix()
y <- dat$y

glmnet::glmnet(x = x, y = y, family = "poisson", lambda = 0.01, alpha = 1)

Thanks for your help!

Session info:

R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflows_0.1.1  poissonreg_0.0.1 parsnip_0.1.0    recipes_0.1.12  
[5] dplyr_0.8.5      tibble_3.0.1    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6       pillar_1.4.4       compiler_4.0.0     gower_0.2.1       
 [5] iterators_1.0.12   class_7.3-16       tools_4.0.0        rpart_4.1-15      
 [9] ipred_0.9-9        packrat_0.5.0      lubridate_1.7.8    lifecycle_0.2.0   
[13] lattice_0.20-41    pkgconfig_2.0.3    rlang_0.4.6        foreach_1.5.0     
[17] Matrix_1.2-18      cli_2.0.2          rstudioapi_0.11    prodlim_2019.11.13
[21] withr_2.2.0        generics_0.0.2     vctrs_0.2.4        glmnet_3.0-2      
[25] grid_4.0.0         nnet_7.3-13        tidyselect_1.0.0   glue_1.4.0        
[29] R6_2.4.1           fansi_0.4.1        survival_3.1-12    lava_1.6.7        
[33] purrr_0.3.4        tidyr_1.0.2        magrittr_1.5       codetools_0.2-16  
[37] ellipsis_0.3.0     MASS_7.3-51.5      splines_4.0.0      hardhat_0.1.2     
[41] assertthat_0.2.1   shape_1.4.4        timeDate_3043.102  utf8_1.1.4        
[45] crayon_1.3.4
1

1 Answers

1
votes

Ok, I just figured out. It seems that the recipe added to the workflow should not be prepared. So just change this part:

rec <- 
  recipes::recipe(y ~ ., data = dat) %>% 
  recipes::step_dummy(all_nominal()) %>% 
  recipes::prep()

by the following:

rec <- 
  recipes::recipe(y ~ ., data = dat) %>% 
  recipes::step_dummy(all_nominal())