I would like to use ggplot2 to illustrate the difference between two similar density distributions. Here is a toy example of the type of data I have:
library(ggplot2)
# Make toy data
n_sp <- 100000
n_dup <- 50000
D <- data.frame(
event=c(rep("sp", n_sp), rep("dup", n_dup) ),
q=c(rnorm(n_sp, mean=2.0), rnorm(n_dup, mean=2.1))
)
# Standard density plot
ggplot( D, aes( x=q, y=..density.., col=event ) ) +
geom_freqpoly()
Rather than separately plot the density for each category ( dup
and sp
) as above, how could I plot a single line that shows the difference between these distributions?
In the toy example above, if I subtracted the dup
density distribution from the sp
density distribution, the resulting line would be above zero on the left side of the plot (since there is an abundance of smaller sp
values) and below 0 on the right (since there is an abundance of larger dup
values). Not that there may be a different number of observations of type dup
and sp
.
More generally - what is the best way to show differences between similar density distributions?