5
votes

I wanted to be clear and use the :: notation in the lines for fitting an mgcv::gam. I stumbled over one thing when using the notation within the model call for mgcv::s. The code with a reproducible example / error is shown below.

The reason is probably because I am using this notation within the model formula, but I could not figure out why this does not work / is not allowed. This is probably something quite specific concerning syntax (probably not mgcv specific, I guess), but maybe somebody can help me in understanding this and my understanding of R. Thank you in advance.

library(mgcv)
dat <- data.frame(x = 1:10, y = 101:110)
# this results in an error: invalid type (list)...
mgcv::gam(y ~ mgcv::s(x, bs = "cs", k = -1), data = dat)
# after removing the mgcv:: in front of s everything works fine
mgcv::gam(y ~ s(x, bs = "cs", k = -1), data = dat)

# outside of the model call, both calls return the desired function
class(s)
# [1] "function"
class(mgcv::s)
# [1] "function"
2
this is a good question, but it's kind of a case of "doctor, it hurts when I do this." "well, then don't do that." I appreciate the desire for clarity, but if you were thinking of using s() for anything other than a smooth term in a gam() formula, that would be pretty crazy ...Ben Bolker
You are right. I am just starting to learn this kind of modelling and was not sure if there might be any function called s in other modellng packages and, therefore, wanted to be clear. I guess that as soon as gamfrom mcgv is called, the call would probably parse everything within the model call intepreted as syntax from mcgv. So the mgcv:: in front of s is not necessary. Still, using this notation should not result in an error to be consistent with the general notation. But since I know I should not use this notation in this specific case I will follow the doctor`s advice ;-).Manuel Bickel
Think of the formula as a symbolic representation of the model, not R code. Whilst you are right that any s() lurking ahead of mgcv:s on the search path will interfere, manipulating formulas is one of the very hard things you can attempt to do with R.Gavin Simpson
@GavinSimpson As you said below, the issue I pointed out is very specific and time can be spent much better than dealing with this one. Nevertheless, I have learned several things about R here, so thank you for your explanations and time spent on this minor issue.Manuel Bickel

2 Answers

4
votes

Explanation

library(mgcv)
#Loading required package: nlme
#This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.

f1 <- ~ s(x, bs = 'cr', k = -1)
f2 <- ~ mgcv::s(x, bs = 'cr', k = -1)

OK <- mgcv:::interpret.gam0(f1)$smooth.spec
FAIL <- mgcv:::interpret.gam0(f2)$smooth.spec

str(OK)
# $ :List of 10
#  ..$ term   : chr "x"
#  ..$ bs.dim : num -1
#  ..$ fixed  : logi FALSE
#  ..$ dim    : int 1
#  ..$ p.order: logi NA
#  ..$ by     : chr "NA"
#  ..$ label  : chr "s(x)"
#  ..$ xt     : NULL
#  ..$ id     : NULL
#  ..$ sp     : NULL
#  ..- attr(*, "class")= chr "cr.smooth.spec"

str(FAIL)
# list()

The 4th line of the source code of interpret.gam0 reveals the issue:

head(mgcv:::interpret.gam0)

1 function (gf, textra = NULL, extra.special = NULL)              
2 {                                                               
3     p.env <- environment(gf)                                    
4     tf <- terms.formula(gf, specials = c("s", "te", "ti", "t2", 
5         extra.special))                                         
6     terms <- attr(tf, "term.labels") 

Since "mgcv::s" is not to be matched, you get the problem. But mgcv does allow you the room to work around this, by passing "mgcv::s" via argument extra.special:

FIX <- mgcv:::interpret.gam0(f, extra.special = "mgcv::s")$smooth.spec
all.equal(FIX, OK)
# [1] TRUE

It is just that this is not user-controllable at high-level routine:

head(mgcv::gam, n = 10)

#1  function (formula, family = gaussian(), data = list(), weights = NULL, 
#2      subset = NULL, na.action, offset = NULL, method = "GCV.Cp",        
#3      optimizer = c("outer", "newton"), control = list(), scale = 0,     
#4      select = FALSE, knots = NULL, sp = NULL, min.sp = NULL, H = NULL,  
#5      gamma = 1, fit = TRUE, paraPen = NULL, G = NULL, in.out = NULL,    
#6      drop.unused.levels = TRUE, drop.intercept = NULL, ...)             
#7  {                                                                      
#8      control <- do.call("gam.control", control)                         
#9      if (is.null(G)) {                                                  
#10         gp <- interpret.gam(formula)  ## <- default to extra.special = NULL

I agree with Ben Bolker. It is a good exercise to dig out what happens inside, but is an over-reaction to consider this as a bug and fix it.


More insight:

s, te, etc. in mgcv does not work in the same logic with stats::poly and splines::bs.

  • When you do for example, X <- splines::bs(x, df = 10, degree = 3), it evaluates x and create a design matrix X directly.
  • When you do s(x, bs = 'cr', k = 10), no evaluation is made; it is parsed.

Smooth construction in mgcv takes several stages:

  1. parsing / interpretation by mgcv::interpret.gam, which generates a profile for a smoother;
  2. initial construction by mgcv::smooth.construct, which sets up basis / design matrix and penalty matrix (mostly done at C-level);
  3. secondary construction by mgcv::smoothCon, which picks up "by" variable (duplicating smooth for factor "by", for example), linear functional terms, null space penalty (if you use select = TRUE), penalty rescaling, centering constraint, etc;
  4. final integration by mgcv:::gam.setup, which combines all smoothers together, returning a model matrix, etc.

So, it is a far more complicated process.

2
votes

This looks like an mgcv issue. For example, the lm() function accepts poly() or stats::poly() and gives the same results (other than the names of things):

> x <- 1:100
> y <- rnorm(100)
> lm(y ~ poly(x, 3))

Call:
lm(formula = y ~ poly(x, 3))

Coefficients:
(Intercept)  poly(x, 3)1  poly(x, 3)2  poly(x, 3)3  
    0.07074      0.13631     -1.52845     -0.93285  

> lm(y ~ stats::poly(x, 3))

Call:
lm(formula = y ~ stats::poly(x, 3))

Coefficients:
       (Intercept)  stats::poly(x, 3)1  stats::poly(x, 3)2  stats::poly(x, 3)3  
           0.07074             0.13631            -1.52845            -0.93285  

It also works with the splines::bs function, so this isn't specific to poly().

You should contact the mgcv maintainer and point out this bug in that package. I'd guess it is looking specifically for s, rather than for an expression like mgcv::s that evaluates to the same thing.