I’m using rstanarm to fit stan_glm models inside a function. Running into a problem where the size of the saved stanfit object explodes in size when saved to .rds, but only when the model is fit inside a function. The issue seems to be that the stanfit object is storing a copy of the local environment, which is then getting saved to disk with write_rds? Manually removing large objects inside the function more or less solves the problem, but that’s a pretty clunky solution, so wondering if anyone has advice for a more elegant way to resolve this issue? Toy reprex below (warning this will write some .rds files to disk, I remove them at the end of the example but heads up)
library(readr)
library(rstanarm)
#> Loading required package: Rcpp
#> rstanarm (Version 2.19.3, packaged: 2020-02-11 05:16:41 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
library(gapminder)
# create a largeish object
test <- matrix(data = rnorm(10000), nrow = 10000/2, ncol = 10000/2)
# fit model in the global environment
a = stan_glm(lifeExp ~ gdpPercap, data = gapminder, refresh =0)
print(object.size(a), unit = "Mb")
#> 1.4 Mb
# fit model inside function , passing but not using largeish object
memfoo <- function(gap, testy, clean = FALSE){
d <- testy
if (clean){
rm(d,testy)
}
a <- stan_glm(lifeExp ~ gdpPercap, data = gap, refresh = 0)
}
b <- memfoo(gapminder, test)
# fit model again, but removing large obects from the environment before running
d <- memfoo(gapminder, test, clean = TRUE)
print(object.size(a), unit = "Mb")
#> 1.4 Mb
print(object.size(b), unit = "Mb")
#> 1.4 Mb
print(object.size(d), unit = "Mb")
#> 1.4 Mb
# all same size in memory
# write to .rds
write_rds(a,"a.rds")
write_rds(b,"b.rds")
write_rds(d,"d.rds")
# rstan object run in function with largeish object in environment is 45 times bigger than same regression
# fit outside function!
file.size("a.rds")
#> [1] 9026456
file.size("b.rds")
#> [1] 410011317
file.size("d.rds")
#> [1] 10011197
file.size("b.rds") / file.size("a.rds")
#> [1] 45.42329
file.remove(c("a.rds", "b.rds", "d.rds"))
#> [1] TRUE TRUE TRUE
Created on 2020-03-11 by the reprex package (v0.3.0)