3
votes

I have a data frame in R in which several of the columns are factors. I'd like to create a series of bar charts showing the relative sizes of each of the factor levels. I want to associate my own customized color palettes to each of the factors, and then customize the final layout of all of the bars and legends using the gridExtra package.

I wrote a an example script which I think should achieve that, however, I obtained a rather surprising result:

library(ggplot2)
library(grDevices)
library(gridExtra)

# Define some dummy data and put it in a data frame
fruit <- factor(c("apple", "orange", "pear", "pear", "pear",
                  "orange", "apple", "apple", "apple", "pear"))
cheese <- factor(c("cheddar", "mozarella", "gruyere", "gruyere", "gouda",
                   "parmesan", "gruyere", "gouda", "mozarella", "cheddar"))
mydata <- data.frame(fruit, cheese)
mydata$dummy <- 0

# Define some custom color schemes
foodclrs <- list()
# Plot the fruit factor in shades of red
h <- c(0.0,  0.0,  0.0)
s <- c(0.95, 0.85, 0.45)
v <- c(0.45, 0.85, 0.95)
foodclrs[[1]] <- hsv(h, s, v)
# Plot the cheese factor in shades of green
h <- c(0.33, 0.33, 0.33, 0.33, 0.33)
s <- c(0.95, 0.93, 0.85, 0.69, 0.45)
v <- c(0.45, 0.69, 0.85, 0.93, 0.95)
foodclrs[[2]] <- hsv(h, s, v)

# Create vectors with individualized text for each plot
bsiz=20
fillvars <- c("fruit", "cheese")
xlabels <- c("Fruits", "Cheeses")
lgdlabels <- c("Types of Fruit", "Types of Cheese")

# Generate a list of plots
plots <- list()
for (ii in 1:2) {
  plots[[ii]] <- ggplot(data=mydata) + 
    geom_bar(aes_string(x="dummy", fill=fillvars[ii]),
             position=position_stack(reverse=TRUE)) +
    scale_fill_manual(values=foodclrs[[ii]], drop=FALSE) +
    theme_bw(base_size=bsiz) +
    labs(x=xlabels[ii], y="") + 
    theme(axis.ticks.y=element_blank(),
          axis.text.y=element_blank()) +
    guides(fill=guide_legend(title=lgdlabels[ii])) +
    coord_flip()
#  print(plots[[ii]])
}

# Print the plots on my own custom-shaped grid
print(grid.arrange(plots[[1]], plots[[2]], ncol=1, nrow=2))

The output of the script looks like this: Incorrect output--both bar charts are in shades of green, but the top one should be in shades of red.

This is not what I was expecting: the color palette for the upper bar chart should have been a range of shades of red. It seems that, although I originally defined the plot object plots[[1]] to have a red color palette associated with it, when I actually went to print it, either R or ggplot2 (I'm not sure which) decided to use the most recent color palette instead; i.e., the one associated with plots[[2]].

Now here's the weird part. If I uncomment the print statement in the for loop, I get two individual plots which are rendered in the correct color scheme (for brevity, I do not bother to include either of them here), and, even more interestingly, the combined bar chart object inside the grid.arrange() function now also displays the correct color scheme: Correctly colored plots, using workaround technique.

While I am happy to have stumbled across this little workaround, now I'm curious: why does it even work in the first place?

That is to say, how is it that calling the "print" statement just at the correct moment inside of the for loop causes a color palette to become permanently attached to each ggplot object, when otherwise it would not?

What's really going on here, "underneath the hood", so to speak? And also, is there a less kludgy way that I could correct the problem? For example, is there some other function that I could call instead of print(), to get the color palette to attach correctly to each plot object, without creating a bunch of individual "dummy" plots that I don't actually need?

2
you could store grobs instead of plots, using ggplotGrob().baptiste
Based upon your profile, it appears that you are actually the main author and maintainer of the gridExtra package? Thanks for taking the time to respond; I know you must be very busy! FWIW, I think I actually prefer your suggestion to use the ggplotGrob() function, even more than I like your official answer--it seems simpler and cleaner than encapsulating the ggplot call within a function and using purrr::pmap() to implement the loop. However, I did upvote both of them, since both ways seemed like valid solutions.stachyra

2 Answers

3
votes

ggplot2 isn't very good with scoping. A possible workaround is to wrap things in a function, but even then I was surprised to need force(),

f <- function(var, fill, xlab, lab){
  force(fill)
  ggplot(data=mydata) + 
    geom_bar(aes_string(x="dummy", fill=var),
             position=position_stack(reverse=TRUE)) +
    scale_fill_manual(values=fill) +
    theme_bw(base_size=bsiz) +
    labs(x=xlab, y="") +
    theme(axis.ticks.y=element_blank(),
          axis.text.y=element_blank()) +
    guides(fill=guide_legend(title=lab)) +
    coord_flip()
}

pl <- purrr::pmap(.f = f, 
                  .l = list(var=fillvars, fill=foodclrs,
                            xlab=xlabels,lab=lgdlabels))

grid.arrange(grobs=pl, nrow=2)
2
votes

Explanation for what's going on under the hood: You specified your scale values as foodclrs[[ii]]. All the plot objects were evaluated at the end of the loop with the current (i.e. last) value of ii.

Workaround 1: Assuming you know beforehand how many colours you need from each palette, one possible workaround is to name all of them explicitly in a single combined palette:

combined.foodclrs <- c(foodclrs[[1]][seq_along(levels(fruit))], 
                       foodclrs[[2]][seq_along(levels(cheese))])
names(combined.foodclrs) <- c(levels(fruit), levels(cheese))

> combined.foodclrs
    apple    orange      pear   cheddar     gouda   gruyere mozarella  parmesan 
"#730606" "#D92121" "#F28585" "#087306" "#10B00C" "#24D921" "#4DED4A" "#87F285" 

You can then use the combined palette in the loop & get correct colours in each plot:

plots <- list()
for (ii in 1:2) {
  plots[[ii]] <- ggplot(data=mydata) + 
    geom_bar(aes_string(x="dummy", fill=fillvars[ii]),
             position=position_stack(reverse=TRUE)) +
    scale_fill_manual(values=combined.foodclrs, drop=FALSE) + #one combined palette
    theme_bw(base_size=bsiz) +
    labs(x=xlabels[ii], y="") + 
    theme(axis.ticks.y=element_blank(),
          axis.text.y=element_blank()) +
    guides(fill=guide_legend(title=lgdlabels[ii])) +
    coord_flip()
}

print(grid.arrange(plots[[1]], plots[[2]], ncol=1, nrow=2))

Workaround 2: As per @baptiste's comment, you can store them using grobs:

plots <- list()
for (ii in 1:2) {
  p <- ggplot(data=mydata) + 
    geom_bar(aes_string(x="dummy", fill=fillvars[ii]),
             position=position_stack(reverse=TRUE)) +
    scale_fill_manual(values=foodclrs[[ii]], drop=FALSE) +
    theme_bw(base_size=bsiz) +
    labs(x=xlabels[ii], y="") + 
    theme(axis.ticks.y=element_blank(),
          axis.text.y=element_blank()) +
    guides(fill=guide_legend(title=lgdlabels[ii])) +
    coord_flip()
  plots[[ii]] <- ggplotGrob(p)
}

print(grid.arrange(plots[[1]], plots[[2]], ncol=1, nrow=2))

plot

(Output is the same for both methods)