1
votes

I have several columns of data in ggplot I wish to plot in a boxplot. Each box represents a single column of data. The boxes should be colored in sets of four (red, green, blue, yellow), ie every 2nd box gets colored green every fourth gets colored yellow etc.

sample data

X1 X1.1 X1.2 X1.3 X2 X2.1 X2.2 X2.3
1    2   3    4    3   2    3    1
2    4   5    5    5   2    1    2
2    3   2    1    2   1    2    5

The closest I got is filling a vector colorVec with repeating colors values and trying to apply it to ggplot.

graph<-ggplot(expressionframemelted, aes(x = Var2, y=value)) +     
geom_boxplot(aes(fill = factor(Var2)))+
ggtitle("Expression Values and Medians")+xlab(valueAmountsP)+ylab("Counts log 10")+
stat_summary(fun.y = median, geom = "point", position = position_dodge(width = .9),
size = 6, shape = 4, show_guide = F)+
theme(axis.text.x=element_text(angle=90))+
scale_x_discrete(labels=nameVecGraph)+
scale_y_log10()+
scale_fill_manual( values = colorVec)

The problem is that if the column values are really low or zero to the point where a box doesn't appear on the plot ggplot for some reason skips using a fill on them and continues on to the next column screwing up the ordering of the coloring.

Any easier way of doing this?

EDIT: I tried epi's answer but the problem of ggplot skipping over columns with low values and messing up the color order remains. I figured out it might be due to the use of log scale. For example try

ggplot(dfmelt, aes(variable, value, fill=variable)) +
geom_boxplot() +
theme(axis.text.x=element_text(angle=90))+
scale_x_discrete(labels=c('C1','C2','C3','C4','C5','C6','C7','C8'))+
scale_y_log10()+
scale_fill_manual(values=rep(c("red","green","blue","yellow"),2))

on

df = read.table(text="X1 X1.1 X1.2 X1.3 X2 X2.1 X2.2 X2.3
            1    0   3    4    3   2    3    1
            2    'NA'   5    5    5   2    1    2
            2     'NA'   2    1    2   1    2    5", header=TRUE)
1

1 Answers

5
votes

How about something like this:

df = read.table(text="X1 X1.1 X1.2 X1.3 X2 X2.1 X2.2 X2.3
1    2   3    4    3   2    3    1
2    4   5    5    5   2    1    2
2    3   2    1    2   1    2    5", header=TRUE)

library(reshape2)
library(dplyr)
library(ggplot2)

ggplot(df %>% melt(), aes(variable, value, fill=variable)) +
  geom_boxplot() +
  scale_fill_manual(values=rep(c("red","green","blue","yellow"),2))

enter image description here

If you make your code reproducible (in this case, that would mean providing a data sample that will work with the code you posted) I can tailor my answer more directly to your question.

UPDATE: In answer to your edited question and your comment: ggplot is not plotting the second column from your updated data set because it contains no positive values. Under a log transformation, zero becomes -Inf and negative values become NA (for real numbers), so there's nothing to plot and ggplot skips over or drops that x-value when assigning colors. To maintain the coloring order, add drop=FALSE to scale_fill_manual.

ggplot(dfmelt, aes(variable, value, fill=variable)) +
  geom_boxplot(show_guide=FALSE) +
  theme(axis.text.x=element_text(angle=90, vjust=0.5)) +
  scale_x_discrete(labels=c('C1','C2','C3','C4','C5','C6','C7','C8')) +
  scale_y_log10(breaks=1:5) +
  scale_fill_manual(values=rep(c("red","green","blue","yellow"),2), drop=FALSE)

enter image description here