0
votes

I am trying to, basically, superimpose two different plots, each colored by a different factor, and want to retain control of the coloring order (ie, not have ggplot decide on the order of the factors levels).

For example, say I have these two different dataframes:

labels1 <- factor(rep(LETTERS[1:2], each=5), levels=LETTERS[2:1])
dfp1 <- data.frame(L1=labels1, x=1:10, y=rep(1:2, each=5))

labels2 <- factor(rep(letters[1:2], each=5), levels=letters[2:1])
dfp2 <- data.frame(L2=labels2, x=(1:10), y=rep(c(0.25, 0.75), each=5))

and I want to plot an ECDF of the first one, coloring by L1:

p <- 
   ggplot(dfp1, aes(x=x, y=y, color=L1)) +
   stat_ecdf()

This generates a plot where the color for B comes before A: enter image description here

If I know want to overlay on top of this the points of the second dataframe, I can do

p + geom_point(data=dfp2, mapping=aes(x=x, y=y, color=L2))

but then ggplot combines the two factors, L1 and L2 into one and uses alphabetical ordering on their levels:

enter image description here

I have several of these plots to make, with the labels for the points changing, so this refactoring completely changes the colors of the elements between the plots. I would like to convince ggplot to keep the levels of each factor in the order that are given, and to keep the order of the factors, too, so that the coloring is done according to

factor(c(as.character(labels1), as.character(labels2)),
       levels=c(levels(labels1), levels(labels2)))

How can I do that ?

1

1 Answers

0
votes

You can use an named color palette to maintain a specific order like this:

cust <- c("purple", "orange", "green", "brown")
names(cust) <- c(levels(labels1), levels(labels2))

p + geom_point(data=dfp2, mapping=aes(x=x, y=y, color=L2)) +
    scale_color_manual(values = cust)