You can have ggplot
use boxplot.stats
(the same function used by base boxplot
) to set the y-values for the box-and-whiskers and the outliers. For example:
# Function to use boxplot.stats to set the box-and-whisker locations
mybxp = function(x) {
bxp = boxplot.stats(x)[["stats"]]
names(bxp) = c("ymin","lower", "middle","upper","ymax")
return(bxp)
}
# Function to use boxplot.stats for the outliers
myout = function(x) {
data.frame(y=boxplot.stats(x)[["out"]])
}
Now we use those functions in stat_summary
to draw the boxplot, as in the example below:
ggplot(my.df.long, aes(x=variable, y=vals)) +
stat_summary(fun.data=mybxp, geom="boxplot") +
stat_summary(fun.data=myout, geom="point") +
theme_bw() + coord_flip()
Now for the log transformation issue: The plots below show, respectively, no coordinate transformation, scale_y_log10
, and coord_trans(y="log10")
. In addition, I've used geom_hline
to add dotted lines at each of the box-and-whisker values and I've added text to show the actual values. To reduce clutter, I've removed the outlier points, and I've faded out the boxplots a bit so that the other components will show up better.
# Set up common plot elements
p = ggplot(my.df.long, aes(x=variable, y=vals)) +
geom_hline(yintercept=mybxp(my.df$a), colour="red", lty="11", size=0.3) +
geom_hline(yintercept=mybxp(my.df$b), colour="blue", lty="11", size=0.3) +
stat_summary(fun.data=mybxp, geom="boxplot", colour="#000000A0", fatten=0.5) +
#stat_summary(fun.data=myout, geom="point") +
theme_bw() + coord_flip()
br = c(5,10,20,50,100,200,500,1000)
## Create plots
# Without log transformation
p1 = p + scale_y_continuous(breaks=br, limits=c(5,1000)) +
stat_summary(fun.y=mybxp, aes(label=round(..y..)), geom="text", size=3, colour="red") +
ggtitle("No Transformation")
# With scale_y_log10
p2 = p + scale_y_log10(breaks=br, limits=c(5,1000)) + ggtitle("scale_y_log10") +
stat_summary(fun.y=mybxp, aes(label=round(..y..,2)), geom="text", size=3, colour="red") +
stat_summary(fun.y=mybxp, aes(label=round(10^(..y..))), geom="text", size=3,
colour="blue", position=position_nudge(x=0.3))
# With coord_trans
p3 = p + scale_y_continuous(breaks=br, limits=c(5,1000)) +
stat_summary(fun.y=mybxp, aes(label=round(..y..)), geom="text", size=3, colour="red") +
coord_trans(y="log10") + ggtitle("coord_trans(y='log 10')")
The three plots are shown below. Note that the last plot, using coord_trans
is not flipped, because coord_trans
overrides coord_flip
. You can probably use something like the code in this SO answer to flip the plot, but I haven't done that here.
The first plot, with no transformations, shows the correct values.
The third plot, using coord_trans
also has everything in the correct locations. Note that coord_trans
is actually changing the y-coordinate system of the plot without changing the values of the plotted points. It's the space itself that's been "distorted" to a log scale.
Now, note that in the second plot, using scale_y_log10
, the boxes are in the correct locations but the ends of the whiskers are in the wrong locations. On the other hand, comparison with the other two plots shows that the location of all the geom_hline
s is correct. Also note that, unlike coord_trans
, scale_y_log10
takes the log of the points themselves and just relabels the y-axis breaks with the unlogged values, while leaving the "space" in the which the points are plotted unchanged. You can see this by looking at the values in red text. The values in blue text are the unlogged values.
See @dww's answer for an explanation of why scale_y_log10
results only in the whisker ends being transformed incorrectly, while the box values are plotted in the right place.
?geom_boxplot
.ggplot
andboxplot
use different methods of calculating the "hinges" – Mike H.scale_x_log10
is the same as usinglog(vals)
as the y variable. – aosmith