Starting with dummy data for the question:
zone <- c(rep("Z1",1000),rep("Z2",1000),rep("Z3",1000),rep("Z4",1000))
scenario <- rep(c(rep("S1",250),rep("S2",250),rep("S3",250),rep("S4",250)),4)
model <- rep(rep(c(rep("m1",50),rep("m2",50),rep("m3",50),rep("m4",50),rep("m5",50)),4),4)
time <- rep(seq(2021,2070), 80)
response <- rep(c(rep(rnbinom(50, mu =exp(4), size = 50), 5),
rep(rnbinom(50, mu =exp(4.5), size = 50), 5),
rep(rnbinom(50, mu =exp(5), size = 50), 5),
rep(rnbinom(50, mu =exp(6), size = 50), 5)),4)
df <- cbind(zone, scenario, model, time, response)
df <- as.data.frame(df)
df$zone <- as.factor(df$zone)
df$scenario <- as.factor(df$scenario)
df$model <- as.factor(df$model)
df$time <- as.numeric(df$time)
df$response <- as.numeric(df$response)
I'm trying to fit a fairly simple GAM model in mgcv in R. I'm interested in how a response variable might vary through time (predictions between 2021 - 2070). I expect that the response will be non-linear, so I first fit a smoother over time:
library(mgcv)
m1 <- response ~ s(time)
In this case, there are four models (m1, m2, m3, m4) that each predict how the response may vary into the future under four different "scenarios" of increasing intensity (S1, S2, S3, S4). I'm not explicitly interested in how the models vary (I expect them to), but instead I'm interested in how the scenarios (from all four model predictions) may vary into the future. To account for this structure of the data (i.e. that each scenario contains four model predictions), I've included "model" as a random effect:
m2 <- gam(response~s(time, k=-1) + scenario + s(model, bs='re'), data=df)
Given that the non-linear response between "response" and "time" is likely to vary among scenarios, I've fitted a model with separate smoothers for each level of scenario using the "by" function:
m3 <- gam(response~s(time, by=scenario, k=-1) + scenario + s(model, bs='re'), data=df)
To add to the complexity, the area that the predictions are modeled from is split into four separate zones, so an additive term of "zone" is included in the model:
m4 <- gam(response~s(time, by=scenario, k=-1) + scenario + zone + s(model, bs='re'), data=df)
My question is: I've accounted for the expectation that the "wigglyness" of the response through time may vary across scenarios in m3 by including s(time, by=scenario, k=-1), but... how do I expand that model to account for the potential for the response to vary not only by scenario but also by zone? I envisioned this as something like:
m5 <- gam(response~s(time, by=scenario:zone, k=-1) + scenario + zone + s(model, bs='re'), data=df)
But this isn't the correct syntax. After reading the honestly excellent paper "Hierarchical generalized additive models in ecology: an introduction with mgcv" in PeerJ several times now, but I'm struggling to understand the leap between the nonlinear functional relationship between a simple x~s(y) gam model and how the shape of the function varies between not one, but two different grouping levels.
As far as I can tell I'd need "A single common smoother plus group-level smoothers with differing wiggliness (Model GI)" approach here to allow each each group-specific smoother (here "scenario" and "zone") to have its own smoothing parameter and hence its own level of wiggliness, but i'm struggling to see how this model would fit into the HGAM approach. How can I adapt M5 to fit within the HGAM framework?