9
votes

Background

Right now, I'm creating a multiple-predictor linear model and generating diagnostic plots to assess regression assumptions. (It's for a multiple regression analysis stats class that I'm loving at the moment :-)

My textbook (Cohen, Cohen, West, and Aiken 2003) recommends plotting each predictor against the residuals to make sure that:

  1. The residuals don't systematically covary with the predictor
  2. The residuals are homoscedastic with respect to each predictor in the model

On point (2), my textbook has this to say:

Some statistical packages allow the analyst to plot lowess fit lines at the mean of the residuals (0-line), 1 standard deviation above the mean, and 1 standard deviation below the mean of the residuals....In the present case {their example}, the two lines {mean + 1sd and mean - 1sd} remain roughly parallel to the lowess {0} line, consistent with the interpretation that the variance of the residuals does not change as a function of X. (p. 131)

How can I modify loess lines?

I know how to generate a scatterplot with a "0-line,":

    # First, I'll make a simple linear model and get its diagnostic stats
    library(ggplot2)
    data(cars)
    mod <- fortify(lm(speed ~ dist, data = cars))
    attach(mod)
    str(mod)

    # Now I want to make sure the residuals are homoscedastic
    qplot (x = dist, y = .resid, data = mod) + 
    geom_smooth(se = FALSE) # "se = FALSE" Removes the standard error bands

But does anyone know how I can use ggplot2 and qplot to generate plots where the 0-line, "mean + 1sd" AND "mean - 1sd" lines would be superimposed? Is that a weird/complex question to be asking?

3
For what it's worth, I'm not wedded to ggplot2. I've just found it to be a really intuitive and powerful package for data display, particularly since I'm an R novice :-) - briandk
I'm not sure what you want. Isn't that just a 68% confidence interval? I was always taught to plot the absolute residuals and a loess. It's a simpler check for varying variance. - hadley
Hadley - My book gives an example of homoscedasticity, and two examples of heteroscedasticity: picasaweb.google.com/brian.danielak/… In the second and third photos, the 0-line for loess doesn't wiggle much at all, but the +1sd and -1sd lines bring out the pattern in the residuals. Granted, the patterns are visible without any loess, but what if they were less apparent in the plot? I can't tell if my problem is at the coding level or the conceptual stats level. Given the data in those pictures, how would you think about assessing homoscedasticity? - briandk
Correct me if I'm wrong Hadley, but I believe that the standard errors for loess are computed assuming homoskedasicity (via residual.scale). Briandk, this doesn't answer your question, but I would use something like: qplot (x = dist, y = abs(.resid), data = mod) + geom_smooth(), and if the line is not flat, use hccm. - Ian Fellows
... I wish I had known about fortify() 2 hours ago. - Matt Parker

3 Answers

4
votes

Apology

Folks, I want to apologize for my ignorance. Hadley is absolutely right, and the answer was right in front of me all along. As I suspected, my question was born of statistical, rather than programmatic ignorance.

We get the 68% Confidence Interval for Free

geom_smooth() defaults to loess smoothing, and it superimposes the +1sd and -1sd lines as part of the deal. That's what Hadley meant when he said "Isn't that just a 68% confidence interval?" I just completely forgot that's what the 68% interval is, and kept searching for something that I already knew how to do. It didn't help that I'd actually turned the confidence intervals off in my code by specifying geom_smooth(se = FALSE).

What my Sample Code Should Have Looked Like

# First, I'll make a simple linear model and get its diagnostic stats.
library(ggplot2)
data(cars)
mod <- fortify(lm(speed ~ dist, data = cars))
attach(mod)
str(mod)

# Now I want to make sure the residuals are homoscedastic.
# By default, geom_smooth is loess and includes the 68% standard error bands.
qplot (x = dist, y = .resid, data = mod) + 
geom_abline(slope = 0, intercept = 0) +
geom_smooth() 

What I've Learned

Hadley implemented a really beautiful and simple way to get what I'd wanted all along. But because I was focused on loess lines, I lost sight of the fact that the 68% confidence interval was bounded by the very lines I needed. Sorry for the trouble, everyone.

1
votes

Could you calculate the +/- standard deviation values from the data and add a fitted curve of them to the plot?

1
votes

Have a look at my question "modify lm or loess function.."

I am not sure I followed your question very well, but maybe a:

+ stat_smooth(method=yourfunction)

will work, provided that you define your function as described here.