0
votes

I want to add multiple vertical lines in my density plot that start at the x-axis and end at the curve using ggplot2. I'm using the starwars dataset from dplyr. I want to plot the height variable as a normal distribution. The dashed lines inside the curve represent the standard deviations. So far I got this (just the plot without the lines):

sd.values = seq(66, 264, 34.77043)
zeros.vector = rep(0, 6)

ggplot(starwars, aes(x=height, y=dnorm(height, m=mean(height, na.rm=T), s=sd(height, na.rm=T)))) +
  geom_line() + labs(x='height', y='f(height)') +
  scale_x_continuous(breaks=sd.values,labels=sd.values)

density plot without lines

enter image description here

Now, I want to add the dashed lines using geom_segment:

ggplot(starwars, aes(x=height, y=dnorm(height, m=mean(height, na.rm=T), s=sd(height, na.rm=T))))+
  geom_line() + labs(x='height', y='f(height)') +
  scale_x_continuous(breaks=sd.values, labels=sd.values) +
  geom_segment((aes(x=sd.values, y=zeros.vector, xend=sd.values,
                    yend=dnorm(sd.values, m=mean(height, na.rm=T), s=sd(height, na.rm=T)))),
               linetyp ='dashed')

But in the end, I only get the following error message:

Error: Aesthetics must be either length 1 or the same as the data (87): x, y, xend and yend

Any idea what I have to change in order to add the dashed lines?

2

2 Answers

2
votes

You need to add a new data.frame (or tibble) to the graph, which can have different dimensions. E.g. like this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
sd.values = seq(66, 264, 34.77043)
# zeros.vector = rep(0, 6)

ggplot(starwars, aes(x=height, y=dnorm(height, m=mean(height, na.rm=T), s=sd(height, na.rm=T))))+
    geom_line() + labs(x='height', y='f(height)') +
    scale_x_continuous(breaks=sd.values, labels=sd.values) +
    geom_segment(mapping = aes(x=SD, y=Zeros, xend=SD,
                      yend=dnorm(SD, m=mean(starwars$height, na.rm=T), s=sd(starwars$height, na.rm=T))),
                 linetype ='dashed', inherit.aes = F, data=data.frame(SD=sd.values, Zeros=rep(0, 6)))
#> Warning: Removed 6 row(s) containing missing values (geom_path).

Created on 2020-12-27 by the reprex package (v0.3.0)

2
votes

When you specify the data argument in ggplot(), this becomes the default dataset. All aesthetic expressions must have the same length as that dataset, unless you specify a new data for a geom. To avoid setting a default dataset, you can specify the data argument in the geoms.

library(tidyverse)

data(starwars)

sd.values <-  seq(66, 264, 34.77043)
mean_height <-  mean(starwars$height, na.rm = TRUE)
sd_height <-  sd(starwars$height, na.rm = TRUE)

ggplot() + 
  geom_line(data = starwars, 
            aes(x = height, y = dnorm(height, m = mean_height, sd = sd_height))) + 
  geom_segment(data = NULL, 
               aes(x = sd.values, xend = sd.values, 
                   y = 0, yend = dnorm(sd.values, m = mean_height, sd = sd_height)),
               linetype = 'dashed')

distribution graph

Note though that the following call will fail even though you specify data=NULL, because ggplot2 will replace the NULL dataset with starwars, the default.

ggplot(data = starwars, aes(x = height, y = dnorm(height, m = mean_height, sd = sd_height))) + 
  geom_line() + 
  geom_segment(data = NULL, 
               aes(x = sd.values, xend = sd.values, 
                   y = 0, yend = dnorm(sd.values, m = mean_height, sd = sd_height)))

Alternatively, you can create a new dataset and specify that.

library(tidyverse)

data(starwars)

mean_height <-  mean(starwars$height, na.rm = TRUE)
sd_height <-  sd(starwars$height, na.rm = TRUE)

df <- data.frame(
  sd_values = seq(66, 264, 34.77043)
) %>% mutate(yend = dnorm(sd_values, mean_height, sd_height))


ggplot() + 
  geom_line(data = starwars, 
            aes(x = height, y = dnorm(height, m = mean_height, sd = sd_height))) + 
  geom_segment(data = df, 
               aes(x = sd_values, xend = sd_values, 
                   y = 0, yend = yend),
               linetype = 'dashed')