Plotting predicted values from a lme model (with polynomials) in R

Question

I am using linear mixed-effect model (run with the lme() function in the nlme package in R) that has one fixed effect, and one random intercept term (to account for different groups). The model is a cubic polynomial model specified as so (following advice given below):

   M1 = lme(dv ~ poly(iv,3), data=dat, random= ~1|group, method="REML")

Some example data only:

> dput(dat)
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"2"), class = "factor"), iv = c(24L, 100L, 110L, 115L, 116L, 
120L, 125L, 127L, 138L, 139L, 142L, 150L, 152L, 154L, 157L, 161L, 
168L, 177L, 181L, 189L, 190L, 198L, 200L, 213L, 216L, 225L, 254L, 
284L, 40L, 51L, 76L, 130L, 155L, 158L, 160L, 163L, 167L, 169L, 
170L, 177L, 185L, 190L, 203L, 206L, 208L, 219L, 223L, 233L, 238L, 
244L, 251L, 260L, 265L), dv = c(0L, 8L, 6L, 8L, 10L, 10L, 9L, 
11L, 12L, 15L, 16L, 19L, 13L, 10L, 17L, 22L, 18L, 22L, 25L, 20L, 
27L, 28L, 29L, 30L, 29L, 30L, 30L, 30L, 0L, 0L, 2L, 7L, 14L, 
12L, 17L, 10L, 14L, 13L, 16L, 15L, 17L, 21L, 25L, 20L, 26L, 27L, 
28L, 29L, 30L, 30L, 30L, 30L, 30L)), .Names = c("group", "iv", 
"dv"), row.names = c(NA, -53L), class = "data.frame")

I would now like to plot fitted values using the predict function (the values of iv are not continuous in the dataset so I would like to improve the appearance /smoothness of a fitted curve).

Using on-line examples on how to plot predicted values from a simple lme model (without polynomials) (see here: Extract prediction band from lme fit and http://glmm.wikidot.com/faq), I can plot predicted ‘population’ means for an lme without polynomials using the below code:

#model without polynomials
dat$group = factor(dat$group)
M2 = lme(dv ~ iv, data=dat, random= ~1|group, method="REML")

#1.create new data frame with new values for predictors (where groups aren't accounted for)
range(dat$iv)
new.dat = data.frame(iv = seq(from =24, to =284, by=1))

#2. predict the mean population response
new.dat$pred = predict(M2, newdata=new.dat, level=0)

#3. create a design matrix
Designmat <- model.matrix(eval(eval(M2$call$fixed)[-2]), new.dat[-ncol(new.dat)])

#4. get standard error and CI for predictions
predvar <- diag(Designmat %*% M2$varFix %*% t(Designmat)) 
new.dat$SE <- sqrt(predvar) 
new.dat$SE2 <- sqrt(predvar+M2$sigma^2)

# Create plot with different colours for grouping levels and plot predicted values for population mean
G1 = dat[dat$group==1, ]
G2 = dat[dat$group==2, ]

plot(G1$iv, G1$dv, xlab="iv", ylab="dv", ylim=c(0,30), xlim=c(0,350), pch=16, col=2)
points(G2$iv, G2$dv, xlab="", ylab="", ylim=c(0,30), xlim=c(0,350), pch=16, col=3)

F0 = new.dat$pred
I = order(new.dat$iv); eff = sort(new.dat$iv)
lines(eff, F0[I], lwd=2, type="l", ylab="", xlab="", col=1, xlim=c(0,30))
#lines(eff, F0[I] + 2 * new.dat$SE[I], lty = 2)
#lines(eff, F0[I] - 2 * new.dat$SE[I], lty = 2)

I would like expand this code to 1) plot the within-group predicted lines as well as the mean population values and 2) determine how the code can be adapted to plot predicted ‘population’ and ‘within-group’ curves for an lme with polynomials (i.e. model M1 above).

Obtaining group predictions: I can obtain one set of predicted values for groups using the below code, but I would like to plot a line for each group, as well as the population mean, and in the case of the example data I cannot see how predicted values for two group lines could be extracted?

new.dat = data.frame(iv = dat$iv, group=rep(c("1","2"),c(28,25)))
Pred = predict(M2, newdata=new.dat, level=0:1)

Also, this does not work if you want to predict a larger number of values than the number of original iv values (e.g. in cases where you have irregular data). The below obviously won’t work because of a differing number of rows, but I am struggling with the syntax.

new.dat = data.frame(iv = seq(from =24, to =284, by=1), group=rep(c("1","2"),c(28,25)))

For a polynomial model: I don't understand how one would incorporate poly(iv,3) into a new.dat data frame to feed into the predict function.

Any advice of how to achieve these two goals would be much appreciated as I have been trying to figure this out for a while with no joy (I would rather use base graphics than ggplot if possible). Thanks!

This is so wrong (as well as unclear by not saying what you mean by "did not work".) Do not construct naive polynomial terms by squaring and cubing terms. Use poly(iv, 3) instead. — IRTFM
If you plot this with xyplot( dv ~ iv|group, dat) it becomes apparent that the highly significant cubic term is an artifact of a ceiling effect because your 'dv' maxes out at 30. — IRTFM
As usual, @BondedDust has the authoritative answer on statistical models and R. The only comment I have is to read help(nlme:::predict.lme). In it you'll find out how to get predictions for each group. — Jthorpe
Generally you will not get good results with the use of double or triple colons for accessing help files. With pkg:nlme loaded this would succeed: help(predict.lme) or without loading it: help(predict.lme, pack=nlme) — IRTFM
@ Bonded dust. Thanks for the advice re: the use or ‘inappropriate use’ of polynomials and the poly() function. Very helpful, but the data above is not real data and was only provided to give an example for the purpose of finding out how one would predict values. Apologies if it isn’t the best example data and if my question was not clear, I have made some edits in an attempt to clarify my issue. I am still a relative beginner in R so please bear with me! The help file on predict.lme does not give much information – that was my first port of call. Thanks again for the polynomial advice. — jjulip

IRTFM IRTFM · Accepted Answer · 2015-01-31T20:43:56

Let me explain in more detail why I think your are jumping into non-linear terms too quickly and should be spending more time examining your data before considering polynomial terms:

First the more correct way to enter polynomial terms of 2nd and third order:

> M1 = lme(dv ~ poly(iv ,3), data=dat, random= ~1|group, method="REML")
> summary(M1)
Linear mixed-effects model fit by REML
 Data: dat 
       AIC      BIC    logLik
  245.4883 256.8393 -116.7442

Random effects:
 Formula: ~1 | group
        (Intercept) Residual
StdDev:    2.465855 2.435135

Fixed effects: dv ~ poly(iv, 3) 
                 Value Std.Error DF   t-value p-value
(Intercept)   18.14854  1.775524 48 10.221507  0.0000
poly(iv, 3)1  64.86375  2.476145 48 26.195452  0.0000
poly(iv, 3)2   2.76606  2.462331 48  1.123349  0.2669
poly(iv, 3)3 -13.90253  2.485106 48 -5.594339  0.0000
 Correlation: 
             (Intr) p(,3)1 p(,3)2
poly(iv, 3)1 -0.002              
poly(iv, 3)2 -0.002  0.027       
poly(iv, 3)3  0.002 -0.036 -0.030

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.6349301 -0.6172897  0.1653097  0.7076490  1.6581112 

Number of Observations: 53
Number of Groups: 2

Now, why would the cubic term be significant when the quadratic term was not? Look at the data ... which should have been the first order of business rather than an after-thought:

library(lattice)
xyplot( dv ~ iv|group, dat)
png(); print(xyplot( dv ~ iv|group, dat) ); dev.off()

enter image description here

As becomes apparent with a simple plotting call their is a systematic cutoff at 30 (and possibly at 0 although the data is a bit sparse down there). So you would be attributing a ceiling effect imposed by constraints your measurement methods to some sort of non-linear term.

Plotting predicted values from a lme model (with polynomials) in R

2 Answers