0
votes

In my correlation ggplot, I would like to add some extra "mini-regression lines" between some pairs of data points.

I have 10 species, which were observed twice (in 2010 and in 2014).

set.seed(42)
obs_2010 <- runif(10, min=1, max=20)
obs_2014 <- runif(10, min=1, max=20)
species <- c("A","B","C","D","E","F","G","H","I","L")
DF <- data.frame(species, obs_2010, obs_2014, stringsAsFactors=T)

I plotted the 2010 values vs the 2014 values, and I got a correlation plot. https://i.stack.imgur.com/W8fM3.jpg

Some of these species are sister species (let's say A-L, B-I, G-H). In addition to the regression line based on all 10 species, I would like to draw a line between points A and L, another line between B and I, another between G and H. Basically, I would like to get this plot (now made with Paint ;) ) https://i.stack.imgur.com/9uXEQ.png

Here some unsuccesfull trials I made:

#pairs to connect: A-L, B-I, G-H
sister=c(1,2,NA,NA,NA,NA,3,3,2,1)

sistasp <- data.frame(species=DF$species,sister=sister, stringsAsFactors=T)

#trial1
ggplot(DF, aes(x=obs_2010, y=obs_2014)) + 
  geom_point(aes(col=species), shape=16, size=3) + theme_bw() + xlim(0,20) + ylim(0,20) +
  geom_smooth(method=glm, se=F, col="black") +   
  geom_line(aes(group=sister), na.rm=T)
#almost good, but also points with NA (those without sister species) are connected

#trial2
ggplot(DF, aes(x=obs_2010, y=obs_2014)) + 
  geom_point(aes(col=species), shape=16, size=3) + theme_bw() + xlim(0,20) + ylim(0,20) +
  geom_smooth(method=glm, se=F, col="black") +  
  geom_segment(data = merge(DF, sistasp, by = "sister"), 
             aes(x=y2010.x, xend=y2010.y, y=y2014.x, yend=y2014.y))
#error message Error in FUN(X[[i]], ...) : object 'y2010.x' not found

Thank you for your help =)

1

1 Answers

0
votes

So I think your first method was almost correct, but you might want to subset the data you're passing to geom_line() to just the data that has sister information. I added $sister to DF so that the grouping is in line with the data.

DF$sister <- sister

ggplot(DF, aes(x=obs_2010, y=obs_2014)) + 
  geom_point(aes(col=species), shape=16, size=3) + theme_bw() + xlim(0,20) + ylim(0,20) +
  geom_smooth(method=glm, se=F, col="black") +   
  geom_line(data = DF[!is.na(sister),], aes(group=sister), na.rm=T)

Is that giving you what you want?