58
votes

I am trying to make a histogram of density values and overlay that with the curve of a density function (not the density estimate).

Using a simple standard normal example, here is some data:

x <- rnorm(1000)

I can do:

q <- qplot( x, geom="histogram")
q + stat_function( fun = dnorm )

but this gives the scale of the histogram in frequencies and not densities. with ..density.. I can get the proper scale on the histogram:

q <- qplot( x,..density.., geom="histogram")
q

But now this gives an error:

q + stat_function( fun = dnorm )

Is there something I am not seeing?

Another question, is there a way to plot the curve of a function, like curve(), but then not as layer?

3
The issue is that you have defined a global y for your plot using ..density.. inside qplot. This confuses stat_function. The easiest fix would be to write qplot(x, geom = 'blank') + geom_histogram(aes(y = ..density..)) + stat_function(fun = dnorm). See my detailed answer belowRamnath
The equivalent to curve(dnorm, -4, 4) would be qplot(x = -4:4, stat = 'function', fun = dnorm, geom = 'line')Ramnath
Ah right, I tried that with the function as first argument but see now what went wrong. Thanks!Sacha Epskamp

3 Answers

58
votes

Here you go!

# create some data to work with
x = rnorm(1000);

# overlay histogram, empirical density and normal density
p0 = qplot(x, geom = 'blank') +   
  geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density') +  
  stat_function(fun = dnorm, aes(colour = 'Normal')) +                       
  geom_histogram(aes(y = ..density..), alpha = 0.4) +                        
  scale_colour_manual(name = 'Density', values = c('red', 'blue')) + 
  theme(legend.position = c(0.85, 0.85))

print(p0)
37
votes

A more bare-bones alternative to Ramnath's answer, passing the observed mean and standard deviation, and using ggplot instead of qplot:

df <- data.frame(x = rnorm(1000, 2, 2))

# overlay histogram and normal density
ggplot(df, aes(x)) +
  geom_histogram(aes(y = stat(density))) +
  stat_function(
    fun = dnorm, 
    args = list(mean = mean(df$x), sd = sd(df$x)), 
    lwd = 2, 
    col = 'red'
  )

enter image description here

6
votes

What about using geom_density() from ggplot2? Like so:

df <- data.frame(x = rnorm(1000, 2, 2))

ggplot(df, aes(x)) + geom_histogram(aes(y=..density..)) + geom_density(col = "red")