The classic example of a histogram is: x = defined bins of some continuous variable, y = frequency of those bins occurring.
My situation:
I have a data set with one column as U.S. zip codes and other columns with various statistics about those zip codes (two of which are median_household_income and population).
I want to make a histogram-type plot where the x axis is bins of the variable median_household_income (in increments of, say, $10,000) and where the y axis is something other than just frequency of those bins occurring--specifically avg population for those bins. (i.e. the populations of all zips in the, say, $40,000-$60,000 bin would averaged, and that population average would be how tall the bar is on the y axis).
The hist
function as well as the histogram functions of ggplot2 don't seem to have an option for something to put y axis. It merely defaults to frequncy.
I have found some luck using ggplot2's ddply
and geom_bar
functions, which have allowed me to put population on the y axis using these two lines of code:
population = ddply(data, "median_household_income", summarise, population = mean(data$population))
ggplot(population, aes(x = factor(data$median_household_income), y = data$population)) + geom_bar(stat = "identity")
...but that doesn't allow me to designate bin sizes and thus group zip codes. It merely produces a separate bar for every zip code in my data set (which obviously makes it impossible to average populations for bins, since there aren't any bins in the first place).
Any help?
cut
function to cut the continuous variable household income into ranges (factor variable) and plot those factors on x axis. – Gopalageom_bar
. Set your bins for your x variable withcut
(orHmisc::cut2
for more flexibility). – alistaire