0
votes

I was creating histograms with ggplot2 in R whose bins are separated with colors and noticed one thing. When the bins of a histogram are separated by colors with fill option, the density value of the histogram turns funny.

Here is the data.

set.seed(42)
x <- rnorm(10000,0,1)
df <- data.frame(x=x, b=x>1)

This is a histogram without fill.

ggplot(df, aes(x = x)) + 
  geom_histogram(aes(y=..density..))

enter image description here

This is a histogram with fill.

ggplot(df, aes(x = x, fill=b)) + 
  geom_histogram(aes(y=..density..))

enter image description here

You can see the latter is pretty crazy. The left side of the bins is sticking out. The density values of the bins of each color are obviously wrong.

I thought over this issue for a while. The data can't be wrong for the first histogram was normal. It should be something in ggplot2 or geom_histogram function. I googled "geom_histogram density fill" and couldn't find much help.

I want the end product to look like:

  1. Separated by colors as you see in the second histogram
  2. Size and shape identical to the first histogram
  3. The vertical axis being density

How would you deal with issue?

2
What's happening is that you are plotting densities - which will normalize each group to integrate to 1. Try frequency histograms to see: ggplot(df, aes(x = x)) + geom_histogram(); ggplot(df, aes(x = x, fill=b)) + geom_histogram()jrlewi
What is it that you are trying to accomplish? What do you want the end product to look like? What does " the bins of a histogram are separated by colors" mean?Elin
@Nate the shape is perfect. but the density is twice as large as the first histogram. I want it to be identical.dixhom
@Elin I updated the original post.dixhom
Density is not percent or proportion. Each group will have a total area of one, so yes it will literally be twice as large with two groups compared to one group.Elin

2 Answers

1
votes

I think what you may want is this:

ggplot(df, aes(x = x, fill=b)) + 
  geom_histogram()

Rather than the density. As mentioned above the density is asking for extra calcuations.

One thing that is important (in my opinion) is that histograms are graphs of one variable. As soon as you start adding data from other variables you start to change them more into bar charts or something else like that.

You will want work on setting the axis manually if you want it to range from 0 to .4.

0
votes

when you provide a column name for the fill parameter in ggplot it groups varaiables and plots them according to each group with a unique color.
if you want a single color for the plot just specify the color you want:

FIXED

ggplot(df, aes(x = x)) + 
  geom_histogram(aes(y=..density..),fill="Blue")