I have a dataframe of ~108m rows of data, in 7 columns. I use this R script to make a boxplot of it:
ggplot(expanded_results, aes(factor(hour), dynamic_nox)) +
geom_boxplot(fill="#6699FF", outlier.size = 0.5, lwd=.1) +
scale_y_log10() +
stat_summary(fun.y=mean, geom="line", aes(group=1, colour="red")) +
ylab(expression(Exposure~to~NO[x])) +
xlab(expression(Hour~of~the~day)) +
ggtitle("Hourly exposure to NOx") +
theme(axis.text=element_text(size=12, colour="black"),
axis.title=element_text(size=12, colour="black"),
plot.title=element_text(size=12, colour="black"),
legend.position="none")
The graph looks like this. It's pretty much fine, however it would be better to have a value towards the top of the Y axis. I guess it should be something like 1000 given the Y axis is a log10 scale. I'm not sure how to do this though?
Any ideas please?
EDIT: In response to DrDom:
Try to add scale_y_log10(breaks=c(0,10,100,1000))
. The output of doing that, is this:
The output of doing the following:
scale_y_log10(breaks=c(0,10,100,1000), limits=c(0,1000))
Is an error of:
Error in seq.default(dots[[1L]][[1L]], dots[[2L]][[1L]], length = dots[[3L]][[1L]]:
'from' cannot be NA, NaN or infinite
In respnonse to Jaap who suggested the following code:
library(ggplot2)
library(scales)
ggplot(expanded_results, aes(factor(hour), dynamic_nox)) +
geom_boxplot(fill="#6699FF", outlier.size = 0.5, lwd=.1) +
stat_summary(fun.y=mean, geom="line", aes(group=1, colour="red")) +
scale_y_continuous(breaks=c(0,10,100,1000,3000), trans="log1p") +
labs(title="Hourly exposure to NOx", x=expression(Hour~of~the~day), y=expression(Exposure~to~NO[x])) +
theme(axis.text=element_text(size=12, colour="black"), axis.title=element_text(size=12, colour="black"),
plot.title=element_text(size=12, colour="black"), legend.position="none")
It produces this graph. Have I done something wrong? I'm still missing a '1000' tick label? A tick inbetween the 10 and the 100 would also be good given that is where most of the data is?
scale_y_log10(breaks=c(0,10,100,1000))
orscale_y_log10(breaks=c(0,10,100,1000), limits=c(0,1000))
– DrDom