I am fitting the same Generalized Additive Model on multiple data sets using the bam
function from mgcv
. While for most of my data sets the fit completes within a reasonable time between 10 and 20 minutes. For a few data sets the run take more than 10 hours to complete. I cannot find any similarities between the slow cases, the final fit is neither exceptionally good nor bad, nor do they contain any noticeable outliers.
How can I figure out why the fit is so slow for these instances? And how might I be able to speed these up?
My model contains two smooth terms (using a cyclic cubic spline basis) and some additional numerical and factor variables. In total 300 coefficients (including those for smooth terms) are estimated. I keep the number of knots intentionally below information theoretically optimal numbers to speed up the fitting process. My data sets contain around 850k rows.
This is the function call:
bam(
value
~ 0
+ weekday_x
+ weekday
+ time
+ "a couple of factor variables encoding special events"
+ delta:weekday
+ s(share_of_year, k=length(knotsYear), bs="cc")
+ s(share_of_year_x, k=length(knotsYear), bs="cc")
, knots=list(
share_of_year=knotsYear
, share_of_year_x=knotsYear
)
, family=quasipoisson()
, data=data
)
knotsYears contains 26 knots.
This model converges reasonably fast for most cases but incredibly slow for a few.