26
votes

I'm using ggplot2 and am trying to generate a plot which shows the following data.

df=data.frame(score=c(4,2,3,5,7,6,5,6,4,2,3,5,4,8),
              age=c(18,18,23,50,19,39,19,23,22,22,40,35,22,16))
str(df)
df

Instead of doing a frequency plot of the variables (see below code), I want to generate a plot of the average values for each x value. So I want to plot the average score at each age level. At age 18 on the x axis, we might have a 3 on the y axis for score. At age 23, we might have an average score of 4.5, and so forth (Edit: average values corrected). This would ideally be represented with a barplot.

ggplot(df, aes(x=factor(age), y=factor(score))) + geom_bar()
Error: stat_count() must not be used with a y aesthetic.

Just not sure how to do this in R with ggplot2 and can't seem to find anything on such plots. Statisticially, I don't know if the plot I desire to plot is even the right thing to do, but that's a different store.

Thanks!

3
Did you want average values, because from your dataset average values at 18 age is 3 (not 3.5), and at 23 age - 4.5 (not 6.2)? - DrDom
Yeah, I want averages. In that example, I just made up some numbers w/o thinking about it. - ATMathew
@ATMathew, but since you're going making the effort of providing some sample data, you should also make sure that your sample output is accurate for the provided data. Otherwise, it leads to unnecessary confusion.... - A5C1D2H2I1M1N2O1R2T1
Just as a comment, in case that you have different groups, like gender, and you want a plot with the group mean on it, aes(x=factor(age), y=score, group = gender, color = gender)), group separate the sample color just give them different colors and a legend - Jason Goal

3 Answers

59
votes

You can use summary functions in ggplot. Here are two ways of achieving the same result:

# Option 1
ggplot(df, aes(x = factor(age), y = score)) + 
  geom_bar(stat = "summary", fun = "mean")

# Option 2
ggplot(df, aes(x = factor(age), y = score)) + 
  stat_summary(fun = "mean", geom = "bar")

enter image description here

Older versions of ggplot use fun.y instead of fun:

ggplot(df, aes(x = factor(age), y = score)) + 
  stat_summary(fun.y = "mean", geom = "bar")
8
votes

If I understood you right, you could try something like this:

library(plyr)
library(ggplot2)
ggplot(ddply(df, .(age), mean), aes(x=factor(age), y=factor(score))) + geom_bar()
6
votes

You can also use aggregate() in base R instead of loading another package.

temp = aggregate(list(score = df$score), list(age = factor(df$age)), mean)
ggplot(temp, aes(x = age, y = score)) + geom_bar()