18
votes

What is the difference (if any) between geom_bar and geom_histogram in ggplot? They seem to produce the same plot and take the same parameters.

3
If you look on ?geom_histogram you will find that "geom_histogram is an alias for geom_bar plus stat_bin "Didzis Elferts
Speaking as a mathematician :-), a histogram is different from a bar chart, even though the names tend to get intermingled. Quoting from Wikipedia, "A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data." A bar plot has no such area restrictionsCarl Witthoft
thanks. although it seems that geom_bar() also have a stat_bin() applied to it, as you can get access to the stat_bin variables like ..count.. and ..density..jamborta

3 Answers

16
votes
  • Bar charts provide a visual presentation of categorical data. Examples:
    • The number of people with red, black and brown hair
    • Look at the geom_bar help file. The examples are all counts.
    • Wikipedia page
  • Histograms are used to plot density of interval (usually numeric) data. Examples,
    • Distributions of age and height
    • geom_hist help file. The examples are distribution of movie ratings.

ggplot2

After a bit more investigating, I think in ggplot2 there is no difference between geom_bar and geom_histogram. From the docs:

 geom_histogram(mapping = NULL, data = NULL, stat = "bin",
    position = "stack", ...)
 geom_bar(mapping = NULL, data = NULL, stat = "bin",
    position = "stack", ...)

I realise that in the geom_histogram docs it states:

geom_histogram is an alias for geom_bar plus stat_bin

but to be honest, I'm not really sure what this means, since my understanding of ggplot2 is that both stat_bin and geom_bar are layers (with a slightly different emphasis).

3
votes

The default behavior is the same from both geom_bar and geom_histogram. This is because (and as @csgillespie mentioned), there is an implied stat_bin when you call geom_histogarm (understandable), and it is also the default statistics transformation applied to geom_bar (arguable behavior IMO). That's why you need to specify stat='identity' when you want the to plot the data as is.

The stat='bin' or stat_bin() is a statistical transformation that ggplot does for you. It provides you as output the variables surrounded with two dots (the ..count.. and ..density... If you don't specify stat='bin' you won't get those variables.

0
votes

geom_bar() is for both x and y-values are categorical data -- so there are spaces between two bars as x-values are factor with distinct levels.

geom_histogram() is for one continuous data and one categorical data. Usually we put the continuous data to the x-axis (so the bars are touching each other as they are continuous) and categorical data to the y-axis.

There is another plot we can use to show the above situation (1 categorical 1 continuous) -- geom_boxplot(). Usually we use y-axis to represent the continuous data as it's going to be a vertical box-and-whisker.