0
votes

Is there a way to limit the data range of an abline or smooth line in ggplot? For instance exponential distribution data can sometimes have significant lead outliers as well as a long but fairly non-interesting tail:

d = sort(rexp(100, rate = 1), decreasing=T) 
ggplot(NULL, aes(1:length(d), d)) + geom_point() + scale_y_log10() + geom_smooth(method = lm, se=F)

enter image description here

The blue line is ggplot's, and the red I've added to show the line I'd like to add by constraining the geom_smooth function to an x-range of say 12-80 - for instance to show the domain in which a hypothesised relationship might exist between variables when accounting for special cases and long tail. Any advice appreciated on how this might be achieved.

2
Illustrator is of course an option, but it would be nice to get geom_smooth's SE confidence thingy in on the actiongeotheory

2 Answers

1
votes

Try this:

library(ggplot2)
set.seed(1)
d <-  sort(rexp(100, rate = 1), decreasing=T)
gg <- data.frame(x=1:length(d),y=d
                 )
ggplot(gg, aes(x,y)) +
  geom_point() + 
  scale_y_log10() + 
  geom_smooth(data=gg[gg$x>11 & gg$x<81,],method = lm, se=F)

0
votes

Unfortunately, I do not have the rep to comment on @jlhoward 's post but I would like to ask if limiting the data in this way impacts the result of the regression line? By subsetting, does it exclude points in the calculation or just make a difference to the shown result?

For example, I wish to perform the following:

# Adding "volume" to the diamonds data frame.
diamonds$volume = diamonds$x * diamonds$y * diamonds$z

ggplot(aes(x = volume, y = price), data = subset(diamonds, volume != 0 & volume < 800)) + 
  geom_point(alpha = 1/50, color = '#7ea4b3') + 
  geom_smooth(method = 'lm')

but the line is longer than i want it. I would like to cut the line at about x = 600.

ggplot(aes(x = volume, y = price), data = subset(diamonds, volume != 0 & volume < 800)) + 
  geom_point(alpha = 1/50, color = '#7ea4b3') + 
  geom_smooth(data = subset(diamonds, volume > 0 & volume < 600),
              method = 'lm')

does this modify the formula of the regression line, is there anyway to check what the formula would be to see if it has changed?