2
votes

I am trying to extract the placement of the knots from a GAM model in order to delineate my predictor variable into categories for another model. My data contains a binary response variable (used) and a continuous predictor (open).

data <- data.frame(Used = rep(c(1,0,0,0),1250),
                   Open = round(runif(5000,0,50), 0))

I fit the GAM as such:

mod <- gam(Used ~ s(Open), binomial, data = data)

I can get the predicted values, and the model matrix etc with either type=c("response", "lpmatrix") within the predict.gam function but I am struggling with out to extract the knot locations at which which the coefficients change. Any suggestion is really appreciated!

out<-as.data.frame(predict.gam(model1, newdata = newdat, type = "response"))

I would also be interested if possible to do something like:

http://www.fromthebottomoftheheap.net/2014/05/15/identifying-periods-of-change-with-gams/

in which the statistical increase/decrease of the splines is identified, however, I am not using a GAMM at this point, and thus, am having problems identifying the similar model characteristics in GAM that are extracted from his GAMM model. This second item is more out of curiosity than anything.

1

1 Answers

4
votes

Comments:

  1. You should have tagged your question with R and mgcv when asking;
  2. At first I want to flag your question as duplicate to mgcv: how to extract knots, basis, coefficients and predictions for P-splines in adaptive smooth? raised yesterday, and my answer there should be pretty useful. But then I realized that there is actually some difference. So I will make some brief explanation here.

Answer:

In your gam call:

mod <- gam(Used ~ s(Open), binomial, data = data)

you did not specify bs argument in s(), therefore the default basis: bs = 'tp' will be used.

'tp', short for thin-plate regression spline, is not a smooth class that has conventional knots. Thin plate spline does have knots: it places knots exactly at data points. For example, if you have n unique Open values, then it has n knots. In univariate case, this is just a smoothing spline.

However, thin plate regression spline is a low rank approximation to full thin-plate spline, based on truncated eigen decomposition. This is a similar idea to principal components analysis(PCA). Instead of using the original n number of thin-plate spline basis, it uses the first k principal components. This reduces computation complexity from O(n^3) down to O(nk^2), while ensuring optimal rank-k approximation.

As a result, there is really no knots you can extract for a fitted thin-plate regression spline.

Since you work with univariate spline, there is really no need to go for 'tp'. Just use bs = 'cr', the cubic regression spline. This used to be the default in mgcv before 2003, when tp became available. cr has knots, and you can extract knots as I showed in my answer. Don't be confused by the bs = 'ad' in that question: P-splines, B-splines, natural cubic splines, are all knots-based splines.