To diagnose more precisely where the problem is, try fitting your model with various terms left out. There are several terms in the model that could blow up on you:
- the fixed effects involving
center
will blow up to 300 columns * 10^6 rows; depending on whether year
is numeric or a factor, the year*center
term could blow up to 600 columns or (nyears*300) columns
- it's not clear to me whether
bam
uses sparse matrices for s(.,bs="re")
terms; if not, you'll be in big trouble (2*10^5 columns * 10^6 rows)
Order of magnitude, a vector of 10^6 numeric values (one column of your model matrix) takes 7.6 Mb, so 500 GB / 7.6 MB would be approximately 65,000 columns ...
Just taking a guess here, but I would try out the gamm4
package. It's not specifically geared for low-memory use, but:
‘gamm4’ is most useful when the random effects are not i.i.d., or
when there are large numbers of random coeffecients [sic] (more than
several hundred), each applying to only a small proportion of the
response data.
I would also make most of the terms into random effects:
gamm4::gamm4(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+
(1|center)+ (1|year)+ (1|year:center)+(1|child), data)
or, if there are not very many years in the data set, treat year as a fixed effect:
gamm4::gamm4(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+
year + (1|center)+ (1|year:center)+(1|child), data)
If there are a small number of years then (year|center)
might make sense, to assess among-center variation and covariation among years ... if there are many years, consider making it a smooth term instead ...