1
votes

I would like to plot a threshold model with smooth 95% confidence interval lines between line segments. You would think this would be on the simple side but I have not been able to find an answer!

My threshold/breakpoints are known, it would be great if there were a way to visualize this data. I have tried the segmented package which produces the following plot:

The plot shows a threshold model with a breakpoint at 5.4. However, the confidence intervals are not smooth between regression lines.

If anyone knows of any way to produce smooth (i.e. without the jump between line segments) CI lines between segmented regression lines (ideally in ggplot) that would be amazing. Thank you so much.

I have included sample data and the code I have tried below:

x <- c(2.26, 1.95, 1.59, 1.81, 2.01, 1.63, 1.62, 1.19, 1.41, 1.35, 1.32, 1.52, 1.10, 1.12, 1.11, 1.14, 1.23, 1.05, 0.95, 1.30, 0.79,
0.81, 1.15, 1.10, 1.29, 0.97, 1.05, 1.05, 0.84, 0.64, 0.80, 0.81, 0.61, 0.71, 0.75, 0.30, 0.30, 0.49, 1.13, 0.55, 0.77, 0.51,
0.67, 0.43, 1.11, 0.29, 0.36, 0.57, 0.02, 0.22, 3.18, 3.79, 2.49, 2.44, 2.12, 2.45, 3.22, 3.44, 3.86, 3.53, 3.13)

y <- c(22.37, 18.93, 16.99, 15.65, 14.62, 13.79, 13.09, 12.49, 11.95, 11.48, 11.05, 10.66, 10.30,  9.96,  9.65,  9.35,  9.07,  8.81,
       8.56,  8.32,  8.09,  7.87,  7.65,  7.45,  7.25,  7.05,  6.86,  6.68,  6.50,  6.32,  6.15,  5.97,  5.80,  5.63,  5.47,  5.30,
        5.13,  4.96,  4.80,  4.63,  4.45,  4.28,  4.09,  3.90,  3.71,  3.50,  3.27,  3.01,  2.70,  2.28, 22.37, 16.99, 11.05,  8.81,
       8.56,  8.32,  7.25,  7.05,  6.50,  6.15,  5.63)

lin.mod <- lm(y ~  x)
segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=2)
plot(x, y)
plot(segmented.mod, add=TRUE, conf.level = 0.95)

which produces the following plot (and associated jumps in 95% confidence intervals):

segmented plot

1
The problem is that in a segmented regression, the confidence intervals aren't smooth and continuous. If you want to show the 95% confidence intervals, this is what they look like. If you want to show smooth lines, then the plot may look nicer, but what you will be showing won't be 95% confidence intervals. If you want help making the lines smooth, then it can certainly be done, but we'd need to see some sample data and what you've tried so far to be able to help you.Allan Cameron
@AllanCameron Thank you so very much, that of course makes a lot of sense. I have now included some sample data along with the plot produced. I would love to hear any thoughts about producing smoothed lines (of course with the acknowledgement that these would no longer be 95% confidence intervals). Thank you again, I really appreciate your help!Emmax
@AllanCameron, why shouldn't CIs be smooth around the change points? See my answer.Jonas Lindeløv
@JonasLindeløv I really like your answer (upvoted), and I think the second part with the fixed cutoff comes closest to what the OP was looking for. I think your first approach is likely to be a better statistical approach overall, though it's difficult to know without a clear description of the problem. My approach in the past when tempted to do segmented regression is to find a non-linear model. This can occasionally lead to a better insight into the problem at hand - see for example dx.doi.org/10.1136/annrheumdis-2013-203293. But I think your answer should be accepted here.Allan Cameron

1 Answers

2
votes

Background: The non-smoothness in existing change point packages are due to the fact that frequentist packages operate with a fixed change point value. But as with all inferred parameters, this is wrong because there is indeed uncertainty concerning the location of the change.

Solution: AFAIK, only Bayesian methods can quantify that and the mcp package fills this space.

library(mcp)
model = list(
  y ~ 1 + x,   # Segment 1: Intercept and slope
  ~ 0 + x  # Segment 2: Joined slope (no intercept change)
)
fit = mcp(model, data = data.frame(x, y))

Default plot (plot.mcpfit() returns a ggplot object):

plot(fit) + ggtitle("Default plot")

Default mcp plot

Each line represents a possible model that generated the data. The posterior for the change point is shown as a blue density. You can add a credible interval on top using plot(fit, q_fit = TRUE) or plot it alone:

plot(fit, lines = 0, q_fit = c(0.025, 0.975), cp_dens = FALSE) + ggtitle("Credible interval only")

mcp credible interval plot

If your change point is indeed known and if you want to model different residual scales for each segment (i.e., quasi-emulate segmented), you can do:

model2 = list(
  y ~ 1 + x,
  ~ 0 + x + sigma(1)  # Add intercept change in residual scale
)
fit = mcp(model2, df, prior = list(cp_1 = 1.9))  # Note: prior is a fixed value - not a distribution.
plot(fit, q_fit = TRUE, cp_dens = FALSE)

mcp emulating segmented

Notice that the CI does not "jump" around the change point as in segmented. I believe that this is the correct behavior. Disclosure: I am the author of mcp.