48
votes

This question follows on from an earlier question and its answers.

First some toy data:

df = read.table(text = 
"School      Year    Value 
 A           1998    5
 B           1999    10
 C           2000    15
 A           2000    7
 B           2001    15
 C           2002    20", sep = "", header = TRUE)

The original question asked how to plot Value-Year lines for each School. The answers more or less correspond to p1 and p2 below. But also consider p3.

library(ggplot2)

(p1 <- ggplot(data = df, aes(x = Year, y = Value, colour = School)) +       
   geom_line() + geom_point())

(p2 <- ggplot(data = df, aes(x = factor(Year), y = Value, colour = School)) +       
  geom_line(aes(group = School)) + geom_point())

(p3 <- ggplot(data = df, aes(x = factor(Year), y = Value, colour = School)) +       
  geom_line() + geom_point())

Both p1 and p2 do the job. The difference between p1 and p2 is that p1 treats Year as numeric whereas p2 treats Year as a factor. Also, p2 contains a group aesthetic in geom_line. But when the group aesthetic is dropped as in p3, the lines are not drawn.

The question is: Why is the group aesthetic necessary when the x-axis variable is a factor but the group aesthetic is not needed when the x-axis variable is numeric?

enter image description here

1

1 Answers

52
votes

In the words of Hadley himself:

The important thing [for a line graph with a factor on the horizontal axis] is to manually specify the grouping. By default ggplot2 uses the combination of all categorical variables in the plot to group geoms - that doesn't work for this plot because you get an individual line for each point. Manually specify group = 1 indicates you want a single line connecting all the points.

You can actually group the points in very different ways as demonstrated by koshke here