4
votes

I am following Chapter 1 of Wickham and Grolemund's "R for data science" on visualization.

I have tried:

 ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

hoping to achieve a plot with all points colored blue, but instead, to my surprise, they were all red! Reading the correct code to achieve the blue points, in page 11 of the printed version or in Section 3.3 of the online version, I found it should be

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

and, in fact, they state that, to manually set an aesthetic you have to give it outside the aes() function, but inside the corresponding geom, geom_point() here. Why is it so? What is the exact explanation for this behavior? In fact, it seemed natural to me that the correct syntax would be the one of the first command.I guess this issue is related either to layers and/or to scope of variables, but I just could not get the hang of it... Can someone spoon feed me?

Edit: Sorry for not doing my correct homework: this is just Exercise 1 proposed in the text itself at the end of the corresponding Section... The answer however still escapes me.

2
Aesthetics can set the color based on a data element. If it's not related to a data element and you just want the geom to be a color, do that outside of the aesthetics. Here's a previous post on the matter: stackoverflow.com/questions/11511911/…Ryan Morton

2 Answers

4
votes

This issue and more specifically the difference in the output from the two mentioned commands are explicitly dealt with in Section 5.4.2 of the 2nd edition of "ggplot2. Elegant graphics for data analysis", by Hadley Wickham himself:

Either:

  • you can map (inside aes) a variable of your data to an aesthetic, e.g., aes(..., color = VarX), or ...
  • you can set (outside aes, but inside a geom element) an aesthetic to a constant value e.g. "blue"

In the first case, of mapping an aesthetic, such as color, ggplot2 chooses a color based on a kind of uniform average of all available colors (at the colorwheel), because the values of the mapped variable are all constant; why should the chosen color coincide with the constant value you happend to choose to map from? More explicitly, if you try the command:

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y =hwy, color = "foo"))

you get exactly the same output plot as in the first command of the original question.

2
votes

I remember how completely confused I was by this when I started using ggplot.

To build on @Mauicio Calvao's answer, use color inside the aes to break up the colours in the plot by a variable of data.frame you are plotting eg:

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = drv))

So when color (or size or linetype or similar things) is inside the aes it's really asking by what object\variable should the colour groups be determined. If this is a string (eg "blue") then they are all given the one group, but the name of that group isn't related to the actual colour.

To assign colours once grouped by color inside the aes you use scale_color_manual

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = drv))+
    scale_colour_manual(values = c("black","blue","orange"))