I have several columns of data in ggplot I wish to plot in a boxplot. Each box represents a single column of data. The boxes should be colored in sets of four (red, green, blue, yellow), ie every 2nd box gets colored green every fourth gets colored yellow etc.
sample data
X1 X1.1 X1.2 X1.3 X2 X2.1 X2.2 X2.3
1 2 3 4 3 2 3 1
2 4 5 5 5 2 1 2
2 3 2 1 2 1 2 5
The closest I got is filling a vector colorVec with repeating colors values and trying to apply it to ggplot.
graph<-ggplot(expressionframemelted, aes(x = Var2, y=value)) +
geom_boxplot(aes(fill = factor(Var2)))+
ggtitle("Expression Values and Medians")+xlab(valueAmountsP)+ylab("Counts log 10")+
stat_summary(fun.y = median, geom = "point", position = position_dodge(width = .9),
size = 6, shape = 4, show_guide = F)+
theme(axis.text.x=element_text(angle=90))+
scale_x_discrete(labels=nameVecGraph)+
scale_y_log10()+
scale_fill_manual( values = colorVec)
The problem is that if the column values are really low or zero to the point where a box doesn't appear on the plot ggplot for some reason skips using a fill on them and continues on to the next column screwing up the ordering of the coloring.
Any easier way of doing this?
EDIT: I tried epi's answer but the problem of ggplot skipping over columns with low values and messing up the color order remains. I figured out it might be due to the use of log scale. For example try
ggplot(dfmelt, aes(variable, value, fill=variable)) +
geom_boxplot() +
theme(axis.text.x=element_text(angle=90))+
scale_x_discrete(labels=c('C1','C2','C3','C4','C5','C6','C7','C8'))+
scale_y_log10()+
scale_fill_manual(values=rep(c("red","green","blue","yellow"),2))
on
df = read.table(text="X1 X1.1 X1.2 X1.3 X2 X2.1 X2.2 X2.3
1 0 3 4 3 2 3 1
2 'NA' 5 5 5 2 1 2
2 'NA' 2 1 2 1 2 5", header=TRUE)