## simulate some data - from mgcv::magic
set.seed(1)
n <- 400
x <- 0:(n-1)/(n-1)
f <- 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
y <- f + rnorm(n, 0, sd = 2)
## load the splines package - comes with R
require(splines)
You use the bs()
function in a formula to lm
as you want OLS estimates. bs
provides the basis functions as given by the knots, degree of polynomial etc.
mod <- lm(y ~ bs(x, knots = seq(0.1, 0.9, by = 0.1)))
You can treat that just like a linear model.
> anova(mod)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
bs(x, knots = seq(0.1, 0.9, by = 0.1)) 12 2997.5 249.792 65.477 < 2.2e-16 ***
Residuals 387 1476.4 3.815
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Some pointers on knot placement. bs
has an argument Boundary.knots
, with default Boundary.knots = range(x)
- hence when I specified the knots
argument above, I did not include the boundary knots.
Read ?bs
for more information.
Producing a plot of the fitted spline
In the comments I discuss how to draw the fitted spline. One option is to order the data in terms of the covariate. This works fine for a single covariate, but need not work for 2 or more covariates. A further issue is that you can only evaluate the fitted spline at the observed values of x
- this is fine if you have densely sampled the covariate, but if not, the spline may look odd, with long linear sections.
A more general solution is to use predict
to generate predictions from the model for new values of the covariate or covariates. In the code below I show how to do this for the model above, predicting for 100 evenly-spaced values over the range of x
.
pdat <- data.frame(x = seq(min(x), max(x), length = 100))
## predict for new `x`
pdat <- transform(pdat, yhat = predict(mod, newdata = pdat))
## now plot
ylim <- range(pdat$y, y) ## not needed, but may be if plotting CIs too
plot(y ~ x)
lines(yhat ~ x, data = pdat, lwd = 2, col = "red")
That produces
bs
does, is it not? Cubic polynomials of degree 3 (by default) fitted between the knots with condition that the individual pieces join smoothly at the knots? – Gavin Simpson?bs
page would seem to fully address the question:lm(weight ~ bs(height, df = 3, knots=c(58, 62, 66, 70, 72), ), data = women)
– IRTFM