2
votes

I have the histogram plot created in ggplot2 and I'd like to overlap it with density line for the same data. Importantly, I don't want to turn histogram into density values, but want to keep N (numbers) on y axis. Is there any way to overlap the histogram and density plot without transforming the histogram, but rather to scale up the density curve ?

The histogram for this data:

img1

The initial density plot for the same data:

img2

The desired overlay but with density on Y-axis instead of counts:

img3

2
See here on making an R question that folks can help with. Right now we've got neither code nor data, so it's hard to do more than guess what you're doing exactlycamille

2 Answers

4
votes

You'll want to use the ..count.. parameter created by stat_density, and then scale it by the bin width.

library(ggplot2)
set.seed(15)
df <- data.frame(x=rnorm(500, sd=10))
ggplot(df, aes(x=x)) + 
  geom_histogram(colour="black", fill="white", binwidth = 5 ) +
  geom_density(aes(y=..count..*5), alpha=.2, fill="#FF6666") 

enter image description here

3
votes

Yes, but you have to choose the right scale factor. Since you do not provide any data, I will illustrate with the built-in iris data.

H = hist(iris$Sepal.Width, main="")

Base histogram

Since the heights are the frequency counts, the sum of the heights should equal the number of points which is nrow(iris). The area under the curve (boxes) is the sum of the heights times the width of the boxes, so

  Area = nrow(iris) * (H$breaks[2] - H$breaks[1])

In this case, it is 150 * 0.2 = 30, but better to keep it as a formula.

Now the area under the standard density curve is one, so the scale factor that we want to use is nrow(iris) * (H$breaks[2] - H$breaks[1]) to make the areas the same. Where do you apply the scale factor?

DENS = density(iris$Sepal.Width)
str(DENS)
List of 7
 $ x        : num [1:512] 1.63 1.64 1.64 1.65 1.65 ...
 $ y        : num [1:512] 0.000244 0.000283 0.000329 0.000379 0.000436 ...
 $ bw       : num 0.123
 $ n        : int 150
 $ call     : language density.default(x = iris$Sepal.Width)
 $ data.name: chr "iris$Sepal.Width"
 $ has.na   : logi FALSE

We want to scale the y values for the density plot, so we use:

DENS$y = DENS$y * nrow(iris) * (H$breaks[2] - H$breaks[1])

and add the line to the histogram

lines(DENS)

Histogram with density curve

You can make this a bit nicer by adjusting the bandwidth for the density calculation

H = hist(iris$Sepal.Width, main="")
DENS = density(iris$Sepal.Width, adjust=0.7)
DENS$y = DENS$y * nrow(iris) * (H$breaks[2] - H$breaks[1])
lines(DENS)

Histogram with adjusted density curve