What is the difference (if any) between geom_bar and geom_histogram in ggplot? They seem to produce the same plot and take the same parameters.
3 Answers
- Bar charts provide a visual presentation of categorical data. Examples:
- Histograms are used to plot density of interval (usually numeric) data. Examples,
- Distributions of age and height
geom_hist
help file. The examples are distribution of movie ratings.
ggplot2
After a bit more investigating, I think in ggplot2 there is no difference between geom_bar
and geom_histogram
. From the docs:
geom_histogram(mapping = NULL, data = NULL, stat = "bin",
position = "stack", ...)
geom_bar(mapping = NULL, data = NULL, stat = "bin",
position = "stack", ...)
I realise that in the geom_histogram
docs it states:
geom_histogram is an alias for geom_bar plus stat_bin
but to be honest, I'm not really sure what this means, since my understanding of ggplot2 is that both stat_bin and geom_bar are layers (with a slightly different emphasis).
The default behavior is the same from both geom_bar and geom_histogram. This is because (and as @csgillespie mentioned), there is an implied stat_bin when you call geom_histogarm (understandable), and it is also the default statistics transformation applied to geom_bar (arguable behavior IMO). That's why you need to specify stat='identity'
when you want the to plot the data as is.
The stat='bin'
or stat_bin()
is a statistical transformation that ggplot does for you. It provides you as output the variables surrounded with two dots (the ..count..
and ..density..
. If you don't specify stat='bin'
you won't get those variables.
geom_bar()
is for both x and y-values are categorical data -- so there are spaces between two bars as x-values are factor with distinct levels.
geom_histogram()
is for one continuous data and one categorical data. Usually we put the continuous data to the x-axis (so the bars are touching each other as they are continuous) and categorical data to the y-axis.
There is another plot we can use to show the above situation (1 categorical 1 continuous) -- geom_boxplot()
. Usually we use y-axis to represent the continuous data as it's going to be a vertical box-and-whisker.