Plot time series with known error (ggplot2)

Question

I'm working with American Community Survey (ACS) 1-year estimates for a specific location over several years. For example, I'm trying to plot how the proportion of men and women riding a bike to work changes over time. From the ACS, I get estimates and standard error, which I can then use to calculate the upper and lower bounds of the estimates.

So the simplified data structure in wide format is like this:

| Year | EstimateM | MaxM | MinM | EstimateF | MaxF | MinF |
|------|-----------|------|------|-----------|------|------|
| 2005 | 3.0       | 3.5  | 2.5  | 2.0       | 2.3  | 1.7  |
| 2006 | 3.1       | 3.5  | 2.6  | 2.0       | 2.3  | 1.7  |
| 2007 | 5.0       | 4.2  | 5.8  | 2.5       | 3.0  | 2.0  |
| ...  | ...       | ...  | ...  | ...       | ...  | ...  |

If I only wanted to plot the estimates, I'd melt the data with only the two Estimate variables as measure.vars

GenderModeCombined_long <- melt(GenderModeCombined,
                            id = "Year",
                            measure.vars = c("EstimateM",
                                             "EstimateF")

The long data can then be easily plotted with ggplot2

ggplot(data=GenderModeCombined_long,
  aes(x=year, y=value, colour=variable)) +
  geom_point() +
  geom_line()

This produces a graph like so

Imgur

(sorry, don't have enough rep to post images)

Where I'm stuck is how to add error bars to the two estimate graphs. I could add them as measure vars to the melted dataset, but then how do I tell ggplot what should be plotted as values and what as error bars? Do I have to create a separate data frame with just the min/max data and then load that separately?

geom_errorbar(data = errordataMmax, aes(ymax = ??, ymin = ??))

I have the feeling that I'm somehow approaching this the wrong way and/or have my data set up the wrong way.

If you can make this question reproducible, you are much more likely to get a useful answer.. — Axeman

lbusett lbusett · Accepted Answer · 2018-12-28T21:25:46

Welcome to SO. The problem here is that you have three "explicit" variables (Estimate, Min and Max) and an "implicit" one (gender) which is coded in column names. A way to solve this is to make "gender" an explicit grouping variable. After you go to long format, create a "gender" variable, remove the indication of gender from the key column (variable) and then go back to wide format. Something like this would work:

library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)

GenderModeCombined <- tibble::tribble(
  ~Year,   ~EstimateM,   ~MaxM,   ~MinM,   ~EstimateF,   ~MaxF,   ~MinF,  
  2005,         3.0,    3.5,    2.5,         2.0,    2.3,    1.7,  
  2006,         3.1,    3.5,    2.6,         2.0,    2.3,    1.7,  
  2007,         5.0,    4.2,    5.8,         2.5,    3.0,    2.0
)

GenderModeCombined.long <- GenderModeCombined %>% 
  # switch to long format
  tidyr::gather(variable, value, -Year,  factor_key = TRUE) %>% 
  # add a gender variable
  dplyr::mutate(gender   = stringr::str_sub(variable, -1)) %>% 
  # remove gender indication from the key column `variable`
  dplyr::mutate(variable = stringr::str_sub(variable, end = -2)) %>%
  # back to wide format
  tidyr::spread(variable, value)

GenderModeCombined.long
#> # A tibble: 6 x 5
#>    Year gender Estimate   Max   Min
#>   <dbl> <chr>     <dbl> <dbl> <dbl>
#> 1  2005 F           2     2.3   1.7
#> 2  2005 M           3     3.5   2.5
#> 3  2006 F           2     2.3   1.7
#> 4  2006 M           3.1   3.5   2.6
#> 5  2007 F           2.5   3     2  
#> 6  2007 M           5     4.2   5.8

ggplot(data=GenderModeCombined.long,
       aes(x=Year, y=Estimate,colour = gender)) +
  geom_point() +
  geom_line() + 
  geom_errorbar(aes(ymax = Max, ymin = Min))

^{Created on 2018-12-29 by the reprex package (v0.2.1)}

Plot time series with known error (ggplot2)

2 Answers

Data