0
votes

I am fitting GAM models to data using the mgcv package in R. Some of my predictors are circular, so I am using a periodic smoother. I run into an issue in cross validation where my holdout dataset can contain values outside the range of the training data. Since the gam package automatically chooses knots for the smooths, this leads to an error (see my related question here -- thanks to @nograpes and @DWin for their explanations of the errors there).

How can I manually specify the outer knots in a periodic smooth?

Example code

The first block generates some data.

library(mgcv)

set.seed(223) # produces error.
# set.seed(123) # no error.

# generate data:
x <- runif(100,min=-pi,max=pi)
linPred <- 2*cos(x) # value of the linear predictor
theta <- 1 / (1 + exp(-linPred)) # 
y <- rbinom(100,1,theta)
plot(x,theta)
df <- data.frame(x=x,y=y)

The next block fits the GAM model with the periodic smooth:

gamFit <- gam(y ~ s(x,bs="cc",k=5),data=df,family=binomial())
summary(gamFit)
plot(gamFit)

It will be somewhere in the specification of the smoother term s(x,bs="cc",k=5) where I'm sure you'll be able to set some knots, but this is not obvious to me from the help of gam or from googling.

This block will fit some holdout data and produce the error if you set the seed as above:

# predict y values for new data:
x.2 <- runif(100,min=-pi,max=pi)
df.2 <- data.frame(x=x.2)
predict(gamFit,newdata=df.2)

Ideally, I would only set the outer knots and let gam pick the rest.

Apologies if this question is better for CrossValidated than SO.

1

1 Answers

3
votes

Try this:

gamFit <- gam(y ~ s(x,bs="cc",k=5), 
              knots=list( x=seq(-pi,pi, len=5) ), 
              data=df, family=binomial())

You will find a worked example at:

?smooth.construct.cr.smooth.spec 

I learned in testing this code that the 'k' parameter in s() needs to match the 'len' parameter in the 'x'-seq() value passed to knots(). I thought incorrectly that the knots argument would get passed to s().