I am currently modelling gene expression over time using the BAM function from the MGCV package. There are certain genes which take a very long time to model, so I decided to try setting the discrete option to True. However, now the summaries of the outputs appear to vary massively between models. This is especially evident in the significance of the random effects terms and the total deviance explained by the model.
Is setting discrete to true somehow impacting the modelling of the random effects?
The models were fit as seen here:
bam(value ~ oGeno +
s(age, bs = "gp", k = 8, m=2) +
s(age, by = oGeno, bs ="gp", k = 8, m=1) +
s(age, sex, bs = "re") +
s(litter, bs = "re") +
s(hash, bs = "re") +
s(hpool, bs = "re"),
data = data, family = family, method="REML"
bam(value ~ oGeno +
s(age, bs = "gp", k = 8, m=2) +
s(age, by = oGeno, bs ="gp", k = 8, m=1) +
s(age, sex, bs = "re") +
s(litter, bs = "re") +
s(hash, bs = "re") +
s(hpool, bs = "re"),
data = data, family = family, discrete = T, method="fREML"
and generate these summaries
When discrete is false:
Family: Tweedie(p=1.32)
Link function: log
Formula:
value ~ oGeno + s(age, bs = basis, k = k, m = 2) + s(age, by = oGeno,
bs = basis, k = k, m = 1) + s(age, sex, bs = "re") + s(litter,
bs = "re") + s(hash, bs = "re") + s(hpool, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.78886 0.05635 31.744 <2e-16 ***
oGeno.L -0.05029 0.04337 -1.159 0.246
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(age) 4.813266 4.864 48.547 < 2e-16 ***
s(age):oGenomut 3.100540 3.188 7.298 7.00e-05 ***
s(age,sex) 0.005974 1.000 0.039 0.188
s(litter) 13.606287 89.000 5.819 0.199
s(hash) 82.078473 148.000 38.136 8.22e-06 ***
s(hpool) 24.505710 28.000 654.942 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.259 Deviance explained = 33.5%
-REML = 27952 Scale est. = 1.9063 n = 15421
and when discrete is True:
Family: Tweedie(p=1.32)
Link function: log
Formula:
value ~ oGeno + s(age, bs = basis, k = k, m = 2) + s(age, by = oGeno,
bs = basis, k = k, m = 1) + s(age, sex, bs = "re") + s(litter,
bs = "re") + s(hash, bs = "re") + s(hpool, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.77812 0.05850 30.393 <2e-16 ***
oGeno.L -0.07017 0.04388 -1.599 0.11
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(age) 4.8187 4.870 31.101 < 2e-16 ***
s(age):oGenomut 3.1031 3.192 4.819 0.00108 **
s(age,sex) 0.3756 1.000 122.771 0.98174
s(litter) 13.3135 93.000 68.682 1.00000
s(hash) 81.8371 152.000 140.906 1.00000
s(hpool) 24.5326 29.000 698.652 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.259 Deviance explained = 27.6%
fREML = 28369 Scale est. = 1.9063 n = 15421