5
votes

I have data for 4 sectors (A,B,C,D) and 5 years. I would like to draw 4 lines, 1 for each sector, adding a point for every year and add a fifth line representing the mean line using the stat_summary statement and controlling the line colors by means of scale_color_manual and point shapes in aes() argument. The problem is that if I add the point geom the legend is split in two parts one for point shapes and one for line colors. I didn't understand how to obtain 1 legend combining colors and points.

Here is an example. First of all let's build the data frame dtfr as follows:

a <- 100; b <- 100; c <- 100; d <- 100
for(k in 2:5){
  a[k] <- a[k-1]*(1+rnorm(1)/100)
  b[k] <- b[k-1]*(1+rnorm(1)/100)
  c[k] <- c[k-1]*(1+rnorm(1)/100)
  d[k] <- d[k-1]*(1+rnorm(1)/100)
}
v <- numeric()
for(k in 1:5){ v <- c(v,a[k],b[k],c[k],d[k]) }

dtfr <- data.frame(Year=rep(2008:2012,1, each=4), 
                   Sector=rep(c("A","B","C","D"),5), 
                   Value=v, 
                   stringsAsFactors=F)

Now let us start to draw our graph by ggpolt2. In the first graph we draw lines and points geom without the mean line:

library(ggplot2)
ggplot(dtfr, aes(x=Year, y=Value)) +
  geom_line(aes(group=Sector, color=Sector)) +
  geom_point(aes(color=Sector, shape=Sector)) +
  # stat_summary(aes(colour="mean",group=1), fun.y=mean, geom="line", size=1.1) +
  scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
  ggtitle("Test for ggplot2 graph")

In this graph we have the legend with line colors and point shapes all in one:

enter image description here

But if I use the stat_summary to draw the mean line using the following code:

ggplot(dtfr, aes(x=Year, y=Value)) +
  geom_line(aes(group=Sector, color=Sector)) +
  geom_point(aes(color=Sector, shape=Sector)) +
  stat_summary(aes(colour="mean",group=1), fun.y=mean, geom="line", size=1.1) +
  scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
  ggtitle("Test for ggplot2 graph")

I get the mean (red) line but the legend is split into two parts one for line colors and one for point shapes. At this point my question is: How can I get the mean line graph with the legend like the one in the first graph? That is, how to get only one legend combining lines and shapes in the second graph where is drawn the mean line?

2

2 Answers

10
votes

Try this:

ggplot(dtfr, aes(x=Year, y=Value)) +
    geom_line(aes(group=Sector, color=Sector)) +
    geom_point(aes(color=Sector, shape=Sector)) +
    stat_summary(aes(colour="mean",shape="mean",group=1), fun.y=mean, geom="line", size=1.1) +
    scale_color_manual(values=c("#004E00", "#33FF00", "#FF9966", "#3399FF", "#FF004C")) +
    scale_shape_manual(values=c(1:4, 32)) +
    ggtitle("Test for ggplot2 graph")

ggplot2 example with a combined legend

Maybe someone more knowledgeable can come in and correct my explanation (or provide a better solution), but here's how I understand it: You have 5 values in the color scale, but you only have 4 in the shape scale; you're missing a value for "mean". So the scales aren't really compatible in a way. You can fix this by assigning a blank shape (32) to your mean line.

1
votes

Here is a different approach that calculates the summary/mean beforehand and adds it as an additional level to the data frame before building the plot.

The approach can be used to easily add an additional line but with a specific color, which may be desired for a summary/mean for example.

First, I calculate the mean and add it to the dtfr of the OP.

dtfr2 <- dtfr %>% 
    dplyr::group_by(Year) %>% 
    dplyr::summarise(Value = mean(Value)) %>% 
    dplyr::mutate(Sector = NA) %>% 
    dplyr::bind_rows(dtfr)

dtfr2 now has additional rows with the mean values stored in Value and NAs in Sector.

Then, building the plot is easy:

p1 <- ggplot(dtfr2, aes(x=Year, y=Value, color = Sector, shape = Sector)) +
    geom_line() +
    geom_point()

Finally, you may tweak the legend a little:

p1 +
    scale_color_discrete(labels = c(letters[1:4], "M"), na.value = "black") +
    scale_shape_discrete(labels = c(letters[1:4], "M"))

ggplot with additional geom_line with specific color