2
votes

I have four series that I would like to plot.
There are 2 models : xg and algo30. There are two types of data: predicted and observed.
This means we have the following 4 series: "predicted xg","observed xg", "predicted 30", "observed 30".

I want "xg" to be blue, "algo30" to be red. I also want predicted to be a solid line and observed to be points.

Here is what I mean, using base plot:

library(magrittr)
library(ggplot2)
library(dplyr)

set.seed(123)
gr <- 1:10
obs.xg <- sort(runif(10, 0.5, 1))
obs.30 <- sort(runif(10, 0.5, 1))
pred.xg <- lm(obs.xg~gr) %>% predict() %>% add(rnorm(10,0,.01))
pred.30 <- lm(obs.30~gr) %>% predict() %>% add(rnorm(10,0,.01))        
plot(gr, obs.xg, col="darkblue", ylim=range(c(obs.xg,obs.30)), pch=20)
lines(gr, pred.xg, col="darkblue", lwd=2)
points(gr, obs.30, col="firebrick", pch=20)
lines(gr, pred.30, col="firebrick", lwd=2)
legend("bottomright", 
  pch=c(20,NA,NA,NA,NA),
  lty=c(NA,1,NA,1,1),
  lwd=c(NA,1,NA,2,2),
  col = c("black","black",NA, "darkblue","firebrick"),
  legend=c("observé","prédit",NA,"xgboost","algo30"),
  bty='n')

enter image description here

Here is my best attempt using ggplot. Notice that the legend doesnt work as I want.

xg.data <- data.frame(model= "xg", decile = seq(1:10), observed = obs.xg, predicted = pred.xg)
algo30.data <- data.frame(model = "algo30",decile = seq(1:10),  observed = obs.30, predicted = pred.30)
ggplotdata <- bind_rows(xg.data, algo30.data)

ggplotdata %>%
   ggplot( aes(x=decile, y= predicted, color= model))+ geom_line()+
  geom_point(aes(x=decile, y= observed, color = model))

enter image description here

1
Don't forget to include your packages as part of the question (I have no idea where add() comes).aosmith
my bad - I lost the first line when copy pasting. it's magrittr::addZoltan
What's wrong with base R? ;)Bastien

1 Answers

3
votes

Most of the time when making a legend like this I look to override.aes in guide_legend().

The idea here is to make a legend using an additional aesthetic that you don't want mapped onto the plot itself and then using constants instead of a variable for that aesthetic. I used alpha, since both points and lines use that aesthetic.

Then the heavy lifting is done in scale_alpha_manual: removing the legend name, making sure the plot still looks right by setting the values, and then, finally, picking the correct point type and lines along with blanks for the legend.

ggplot(ggplotdata, aes(x=decile, y= predicted, color= model))+ 
    geom_line( aes(alpha = "prédit") )+
    geom_point(aes(x=decile, y= observed, alpha = "observé")) +
    scale_alpha_manual(name = NULL, values = c(1, 1),
                       guide = guide_legend(override.aes = list(linetype = c(0, 1), shape = c(16, NA)))) +
    scale_color_manual(name = NULL, values = c("firebrick", "darkblue"))

enter image description here