4
votes

I would like to plot a distribution of counts using the barplot function in R, and underlay it with a boxplot to include information on median, quartiles, and outliers. A not-too-elegant solution for this has been found for histogram and boxplots: http://rgraphgallery.blogspot.com/2013/04/rg-plotting-boxplot-and-histogram.html.

There are many places online where one can find the argument being made that numerical data should be plotted with histograms while categorical data should be plotted with bar plots. My data are numerical, and in fact on a ratio scale (as they are counts), but because they are discrete, I want columns with gaps, not columns that touch, which seems to be the only option for histogram().

I currently have the following, but bar- and boxplot do not align quite perfectly:

set.seed(476372)
counts1 <- rpois(10000,3)
nf <- layout(mat = matrix(c(1,2),2,1, byrow=TRUE),  height = c(3,1))
par(mar=c(3.1, 3.1, 1.1, 2.1))
barplot(prop.table(table(counts1)))
boxplot(counts1, horizontal=TRUE,  outline=TRUE,ylim=c(0,12), frame=F, width = 10)

Here my question: How can I make them align?

2
if one of the answers meets your needs, can you please accept it? It provides closure within SO (and, whether or not it's my answer), is a pat-on-the-back to the answerer.r2evans

2 Answers

5
votes

Another option that's similar but a little more work. This preserves the option for gaps between the bars:

tbl <- prop.table(table(counts1))
left <- -0.4 + do.call('seq', as.list(range(counts1)))
right <- left + (2 * 0.4)
bottom <- rep(0, length(left))
top <- tbl
xlim <- c(-0.5, 0.5) + range(counts1)

nf <- layout(mat = matrix(c(1,2),2,1, byrow=TRUE),  height = c(3,1))
par(mar=c(3.1, 3.1, 1.1, 2.1))
plot(NA, xlim=xlim, ylim=c(0, max(tbl)))
rect(left, bottom, right, top, col='gray')
boxplot(counts1, horizontal=TRUE,  outline=TRUE, ylim=xlim, frame=F, width = 10)

enter image description here

0
votes

Maybe using a "fake" histogram at the end

ht=hist(counts1,breaks=12,plot = F)
ht$counts=as.numeric(table(counts1))
ht$density=as.numeric(prop.table(table(counts1)))
ht$breaks=as.numeric(names(table(counts1)))
ht$mids=sapply(1:(length(ht$breaks)-1),function(z)mean(ht$breaks[z:(z+1)]))

plot(ht,freq=F,col=3,main="")
boxplot(counts1, horizontal=TRUE,outline=TRUE,ylim=range(ht$breaks), frame=F, col="green1", width = 10)

enter image description here