Piecewise regression with R: plotting the segments

Question

I have 54 points. They represent offer and demand for products. I would like to show there is a break point in the offer.

First, I sort the x-axis (offer) and remove the values that appears twice. I have 47 values, but I remove the first and last ones (doesn't make sense to consider them as break points). Break is of length 45:

Break<-(sort(unique(offer))[2:46])

Then, for each of these potential break points, I estimate a model and I keep in "d" the residual standard error (sixth element in model summary object).

d<-numeric(45)
for (i in 1:45) {
model<-lm(demand~(offer<Break[i])*offer + (offer>=Break[i])*offer)
d[i]<-summary(model)[[6]] }

Plotting d, I notice that my smaller residual standard error is 34, that corresponds to "Break[34]": 22.4. So I write my model with my final break point:

model<-lm(demand~(offer<22.4)*offer + (offer>=22.4)*offer)

Finally, I'm happy with my new model. It's significantly better than the simple linear one. And I want to draw it:

plot(demand~offer)
i <- order(offer)
lines(offer[i], predict(model,list(offer))[i])

But I have a warning message:

Warning message:
In predict.lm(model, list(offer)) :
  prediction from a rank-deficient fit may be misleading

And more important, the lines are really strange on my plot.

My plot with the supposedly two segments, but not joining

Here are my data:

demand <- c(1155, 362, 357, 111, 703, 494, 410, 63, 616, 468, 973, 235,
            180, 69, 305, 106, 155, 422, 44, 1008, 225, 321, 1001, 531, 143,
            251, 216, 57, 146, 226, 169, 32, 75, 102, 4, 68, 102, 462, 295,
            196, 50, 739, 287, 226, 706, 127, 85, 234, 153, 4, 373, 54, 81,
            18)
offer <- c(39.3, 23.5, 22.4, 6.1, 35.9, 35.5, 23.2, 9.1, 27.5, 28.6, 41.3,
           16.9, 18.2, 9, 28.6, 12.7, 11.8, 27.9, 21.6, 45.9, 11.4, 16.6,
           40.7, 22.4, 17.4, 14.3, 14.6, 6.6, 10.6, 14.3, 3.4, 5.1, 4.1,
           4.1, 1.7, 7.5, 7.8, 22.6, 8.6, 7.7, 7.8, 34.7, 15.6, 18.5, 35,
           16.5, 11.3, 7.7, 14.8, 2, 12.4, 9.2, 11.8, 3.9)

54 points does not seem a very large amount of points to detect such a transistion. You could opt to also post this in stats.stackexchange.com with the specfic question if this is enough points to detect the break in the data. Just my 2 ct. — Paul Hiemstra
I think this is pretty statistically dubious. Better to estimate the breakpoint in the model itself (although that makes it non linear). You can't trust the p-vals or standard errors from the current informal estimation process. — hadley
54 points isn't a large amount, I agree, but both my linear and my piecewise linear regressions are significant. And moreover, the residual standard error of the piecewise linear model is 103.9 compared to 121.3 for the linear model with two less degrees of freedom. The piecewise model is significantly better. — Antonin

Ramnath Ramnath · Accepted Answer · 2012-01-06T18:52:44

Here is an easier approach using ggplot2.

require(ggplot2)
qplot(offer, demand, group = offer > 22.4, geom = c('point', 'smooth'), 
   method = 'lm', se = F, data = dat)

EDIT. I would also recommend taking a look at this package segmented which supports automatic detection and estimation of segmented regression models.

enter image description here

UPDATE:

Here is an example that makes use of the R package segmented to automatically detect the breaks

library(segmented)
set.seed(12)
xx <- 1:100
zz <- runif(100)
yy <- 2 + 1.5*pmax(xx - 35, 0) - 1.5*pmax(xx - 70, 0) + 15*pmax(zz - .5, 0) + 
  rnorm(100,0,2)
dati <- data.frame(x = xx, y = yy, z = zz)
out.lm <- lm(y ~ x, data = dati)
o <- segmented(out.lm, seg.Z = ~x, psi = list(x = c(30,60)),
  control = seg.control(display = FALSE)
)
dat2 = data.frame(x = xx, y = broken.line(o)$fit)

library(ggplot2)
ggplot(dati, aes(x = x, y = y)) +
  geom_point() +
  geom_line(data = dat2, color = 'blue')

segmented

Piecewise regression with R: plotting the segments

3 Answers