4
votes

Using R ggplot to plot density plot for multiple plot.

Using the following data.frame:

set.seed(1234)
df <- data.frame(
  sex=factor(rep(c("F", "M"), each=5)),
  weight=round(c(rnorm(5, mean=0, sd=0),
                 rnorm(5, mean=2, sd=5)))
)

Let's first plot only the female group:

library(dplyr)
ggplot(df %>% filter(sex=="F"), aes(x=weight, color=sex)) + geom_density()

Women only density plot

enter image description here

But, if we try to plot both men and women:

ggplot(df, aes(x=weight, color=sex)) + geom_density()

density plot for both women and men

enter image description here

We get a completely different density plot for the women

I assumed that the density is being calculated per population. So, adding a different population (men in this case) shouldn't change the women density.

1

1 Answers

3
votes

All the women have a weight of 0, so the from and to in density() are both 0, which is why you get a vertical line. When the men are added, you get a different from and to (-10 and 7, the range of weight now), and then it does a density estimation with a bandwidth determined by the nrd0 algorithm. (See ?bw.nrd0; in this case it's about 4 for men and 0.65 for women.) The smoothing (gaussian by default) creates the peaked shape.

To get a better idea of what's going on, try some other arguments for the parameters of density(), e.g.

ggplot(df, aes(x=weight, color=sex)) + geom_density(kernel = 'triangular', bw = 0.5)

triangular with longer bandwidth