I'm currently plotting some data (response times in ms) in geom_boxplot.
I have a question:
When you adjust the limits on the y-axis does it disregard any values above that in the plotting & error bar calculations?
The data itself comprises of over 20k entries and I'm not sure providing a sample will be of much use as this is a more functionality based question.
Here is the code I use:
f <- function(x) {ans <- boxplot.stats(x)
data.frame(ymin = ans$conf[1], ymax = ans$conf[2], y = ans$stats[3])}
RTs.box = ggplot(mean.vis.aud.long, aes(x = Report, y = RTs, fill =Report)) + theme_bw() + facet_grid(Audio~Visual)
RTs.box +
geom_boxplot(alpha = .8) + geom_hline(yintercept = .333, linetype = 3, alpha = .8) + theme(legend.position = "none") + ylab("Reposponse Times ms") + scale_fill_grey(start=.4) +
labs(title = expression("Visual Condition")) + theme(plot.title = element_text(size = rel(1)))+
theme(panel.background = element_rect())+
#line below for shaded confidence intervals
stat_summary(fun.data = f, geom = "crossbar",
colour = NA, fill = "skyblue", width = 0.75, alpha = .9)+
ylim(0,1000)#this is the value that I change that results in different plots and shaded confidence intervals
Here is the plot with
ylim(0,1000)
And using the same data but changing the limit to
ylim(0,3000)
results in this plot:
As you can see the values in the boxplots appear to be adjusted according to the limit used. Instead of plotting to the edge of the limit the percentiles are reduced. This is apparent when you compare the middle boxplot in the top-left panel of both grids.
There are differences in the confidence intervals also as can be seen.
Does this mean geom_boxplot is discarding the data above the limit or is there something I'm missing?
I want to include all the data when plotting the boxplot & confidence intervals but limit the scale so it can be seen clearly. It means not seeing some major outliers in the data but for my purposes that is fine.
Has anyone got any suggestions as to what is going on here & how to get around it without potentially dropping the values from the data outside the visual range chosen for my calculation?
Thanks as always.