2
votes

I've been looking around online and I can't find a clear answer how to fill different sections under the normal distribution using ggplot and stat_function --> dnorm.

#Parameters 

mu = 18.7
sigma = 36.4
n = 168
x = c(mu - 4*sigma/sqrt(n), mu + 4*sigma/sqrt(n))

Here's what I have for the plot

ggplot(data, aes(x)) + 
  stat_function(fun = dnorm,
                args = list(mean = mu, sd = sigma/sqrt(n)), 
                geom = "area", 
                fill = "steelblue")

Plot

What I want is to fill the first standard deviation one color, the second another color, and so on. I've also noticed that the curve does not appear smooth. Any reason why?

Thanks in advance.

1
Regarding the "curve does not appear smooth" - stat_function has an argument n for how many points along the curve to use. It's default is 101. You can use higher values of n to make the curve smoother. - Gregor Thomas
Filling different regions with different colors suggests that the regions must be different. As crazy as that may sound, realize that your code above is making just one region. BTW, please test the code you post to questions. In this case, Error: You're passing a function as global data, because you're suggesting that data is a data.frame object when to the rest of us, it is the utils::data function. - r2evans
I completely forgot to data the data frame section of code, sorry! - LPKirby

1 Answers

3
votes

To get different colors with stat_function, you can use the xlim argument, e.g.,

ggplot(data.frame(x), aes(x)) + 
  stat_function(fun = dnorm,
                args = list(mean = mu, sd = sigma/sqrt(n)), 
                geom = "area", 
                fill = "steelblue", n = 1001) +
    stat_function(fun = dnorm,
                args = list(mean = mu, sd = sigma/sqrt(n)), 
                geom = "area", 
                fill = "chartreuse", n = 1001, xlim = c(mu - sigma / sqrt(n), mu)) 

That could get tedious, so if you want every standard deviation different it would be simpler to generate the function data and use geom_area directly:

sim_dat = data.frame(x = seq(x[1], x[2], length.out = 1001))
sim_dat$y = dnorm(sim_dat$x, mean = mu, sd = sigma / sqrt(n))
sim_dat$sds = cut(sim_dat$x, breaks = c(-Inf, mu + sigma / sqrt(n) * (-3:3), Inf), labels = setdiff(-4:4, 0))

ggplot(sim_dat, aes(x, y, fill = sds)) + geom_area()

enter image description here