1
votes

Need help with ggplot that plots averages for y axis and returns the line plot with points and also the text labels for each points (using ggplot functionality) that are color coded as per the respective "color" object parameter. As far as possible I don't want to create any intermediate dataframe from original data to create summary for y means. I tried using fun.y as shown in the code snippet. Excel chart is also attached.

Sample data

set.seed(1)
age_range = sample(c("ar2-15", "ar16-29", "ar30-44"), 20, replace = TRUE)
gender = sample(c("M", "F"), 20, replace = TRUE)
region = sample(c("A", "B", "C"), 20, replace = TRUE)
physi = sample(c("Poor", "Average", "Good"), 20, replace = TRUE)
height = sample(c(4,5,6), 20, replace = TRUE)
survey = data.frame(age_range, gender, region,physi,height)

ggplot code I tried

ggplot(survey, aes(x=age_range, y=height, color=gender)) + stat_summary(fun.y=mean, geom = "point")+geom_line()

Output I am getting

enter image description here

Output I am looking for

enter image description here

1
Try stat_summary(fun.y=mean, geom = "line") in place of geom_line()Sandy Muspratt
Thanks Sandy...It generates an error message "geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?"Jennifer
Sorry, Yes. group = gender in aesthetics.Sandy Muspratt

1 Answers

3
votes

Following up on @Sandy's comment, you can also add the labels in a similar fashion, though here I am using the package ggrepel to make sure they don't overlap (without having to manually code the location). For the location, you can read the result from the call to mean which is returned as y by calling ..y.. in the aesthetics.

ggplot(survey, aes(x=age_range, y=height, color=gender, group = gender)) +
  stat_summary(fun.y=mean, geom = "point") +
  stat_summary(fun.y=mean, geom = "line") +
  stat_summary(aes(label = round(..y.., 2)), fun.y=mean, geom = "label_repel", segment.size = 0)

Gives

enter image description here

(Note that segment.size = 0 is to ensure that there is not an additional line drawn from the point to the label.)

As of now, it does not appear that ggrepel offers text displacement in only one axis (see here ), so you may have to manually position labels if you want more precision.

If you want to set the label locations manually, here is an approach that uses dplyr and the %>% pipe to avoid having to save any intermediate data.frames

The basic idea is described here. To see the result after any step, just highlight up to just before the %>% at the end of a line and run. First, group_by the x location and grouping that you want to plot. Get the average of each using summarise. The data are still group_by'd the age_range (summarise only rolls up one group at a time). So, you can determine which of the groups has a higher mean at that point by subtracting the mean. I used sign just to pull if it was positive or negative, then multiplied/divided by a facto to get the spacing I wanted (in this case, divided by ten to get spacing of 0.1). Add that adjustment to the mean to set where you want the label to land. Then, pass all of that into ggplot and proceed as you would with any other data.frame.

survey %>%
  group_by(age_range, gender) %>%
  summarise(height = mean(height)) %>%
  mutate(myAdj = sign(height - mean(height)) / 10
         , labelLoc = height + myAdj) %>%
  ungroup() %>%
  ggplot(aes(x = age_range
             , y = height
             , label = round(height, 2)
             , color = gender
             , group = gender
  )) +
  geom_point() +
  geom_line() +
  geom_label(aes(y = labelLoc)
             , show.legend = FALSE)

Gives:

enter image description here

Which seems to accomplish your base goals, though you may want to play around with spacing etc. for your actual use case.