11
votes

I am trying to control the order of items in a legend in a ggplot2 plot in R. I looked up some other similar questions and found out about changing the order of the levels of the factor variable I am plotting. I am plotting data for 4 months, December, January, July, and June.

If I just do one plot command for all the months, it works as expected with the months ordered in the legend appearing in the order of the levels of the factor. However, I need to have a different dodge value for the summer (June & July) and winter (Dec & Jan) data. I do this with two geom_pointrange commands. When I divide it into 2 steps, the order of the legend goes back to alphabetical. You can demonstrate by commenting out the "plot summer" or "plot winter" command.

What can I change to keep my factor level order in the legend?

Please ignore the odd looking test data - the real data looks fine in this plot format.

#testdata
hour <- rep(seq(from=1,to=24,by=1),4)
avg_hou <- sample(seq(0,0.5,0.001),96,replace=TRUE)
lower_ci <- avg_hou - sample(seq(0,0.05,0.001),96,replace=TRUE)
upper_ci <- avg_hou + sample(seq(0,0.05,0.001),96,replace=TRUE)
Month <- c(rep("December",24), rep("January",24), rep("June",24), rep("July",24))

testdata <- data.frame(Month,hour,avg_hou,lower_ci,upper_ci)
testdata$Month <- factor(alldata$Month,levels=c("June", "July", "December","January"))

#basic plot setup
plotx <- ggplot(testdata, aes(x = hour, y = avg_hou, ymin = lower_ci, ymax = upper_ci, color = Month, shape = Month))
plotx <- plotx + scale_color_manual(values = c("June" = "#FDB863", "July" = "#E66101",  "December" = "#92C5DE", "January" = "#0571B0"))

#plot summer
plotx  <- plotx + geom_pointrange(data = testdata[testdata$Month == "June" | testdata$Month == "July",], size = 1, position=position_dodge(width=0.3)) 
#plot winter
plotx  <- plotx + geom_pointrange(data = testdata[testdata$Month == "December" | testdata$Month == "January",], size = 1, position=position_dodge(width=0.6))

print(plotx)
2
+1 for posting your first question with reproducible example, showing us the code you have tried and a clear description of the desired result. Cheers.Henrik
Thanks - I find that is the most helpful way when I am trying to find solutions in others' questions as well.Scott

2 Answers

13
votes

One possibility is to add a geom_blank as a first layer in the plot. From ?geom_blank: "The blank geom draws nothing, but can be a useful way of ensuring common scales between different plots.". We tell the geom_blank layer to use the entire data set. This layer thus sets up a scale which includes all levels of 'Month', correctly ordered. Then add the two layers of geom_pointrange, which each uses a subset of the data.

Perhaps a matter of taste in this particular case, but I tend to prefer to prepare the data sets before I use them in ggplot.

df_sum <- testdata[testdata$Month %in% c("June", "July"), ]
df_win <- testdata[testdata$Month %in% c("December", "January"), ]

ggplot(data = testdata, aes(x = hour, y = avg_hou, ymin = lower_ci, ymax = upper_ci,
       color = Month, shape = Month)) +
  geom_blank() +
  geom_pointrange(data = df_sum, size = 1, position = position_dodge(width = 0.3)) +
  geom_pointrange(data = df_win, size = 1, position = position_dodge(width = 0.6)) +
  scale_color_manual(values = c("June" = "#FDB863", "July" = "#E66101",
                     "December" = "#92C5DE", "January" = "#0571B0"))

enter image description here

2
votes

Another way to think about "dodge" is as an offset from the x-values based on group (in this case Month). So if we add a dodge (x-offset) column to your original data, based on month:

# your original sample data
# note the use of set.seed(...) so "random" data is reproducible
set.seed(1)
hour     <- rep(seq(from=1,to=24,by=1),4)
avg_hou  <- sample(seq(0,0.5,0.001),96,replace=TRUE)
lower_ci <- avg_hou - sample(seq(0,0.05,0.001),96,replace=TRUE)
upper_ci <- avg_hou + sample(seq(0,0.05,0.001),96,replace=TRUE)
Month    <- c(rep("December",24), rep("January",24), rep("June",24), rep("July",24))
testdata       <- data.frame(Month,hour,avg_hou,lower_ci,upper_ci)
testdata$Month <- factor(testdata$Month,levels=c("June", "July", "December","January"))

# add offset column for dodge
testdata$dodge <- -2.5+(as.integer(testdata$Month))

# create ggplot object and default mappings
ggp <- ggplot(testdata, aes(x=hour, y = avg_hou, ymin = lower_ci, ymax = upper_ci, color = Month, shape = Month))
ggp <- ggp + scale_color_manual(values = c("June" = "#FDB863", "July" = "#E66101", "December" = "#92C5DE", "January" = "#0571B0"))

# plot the point range
ggp + geom_pointrange(aes(x=hour+0.2*dodge), size=1)

Produces this:

This does not require geom_blank(...) to maintain the scale order, and it does not require two calls to geom_pointrange(...)