0
votes

I have an example dataframe with several unequal length vectors (i.e. some are 5 datapoints long, some are 3, etc. I have a loop that generates a ggplot for each column. However, I can't figure out how to dynamically shorten the plot when there are missing data.

Data Example:

        date        X1        X2        X3
1 1997-01-31 0.6094410        NA 0.5728303
2 1997-03-03 0.7741195        NA 0.0582721
3 1997-03-31 0.7269925 0.5628813 0.8270764
4 1997-05-01 0.5471391 0.5381265 0.8678812
5 1997-05-31 0.8056487 0.4129166 0.6582061

Code so far:

vars <- colnames(data[-1])
plots <- list()

for (x in 1:length(vars)) {
  plot[[x]] <- ggplot(data = data, aes_q(x = data[, 1], y = data[, x + 1])) + 
    geom_line()
}

Plotting the first plot yields a good result:

Plot 1

But, plotting the second plot yields this short line:

Plot 2

How can I change my loop so that the second plot is this?:

Plot 3

Thank you in advance! Any help is appreciated

1
1) stop using data as an object name. 2) subset your argument passed to the data parameter. At the moment you are giving an entire column of dates to the plotting routine. - IRTFM
What happens if you add na.omit(data) to the geom_line call? - bob1

1 Answers

1
votes

Before you specify which column you want for the y-axis, ggplot will prepare to map to the whole data frame. So if you just enter ggplot(data, aes(x = date)), you'll already get a blank plot with that range:

enter image description here

So if you don't want some series to print the whole range, you have to filter the data set first, to the rows that are defined for the column you're going to use for the y values. For instance, you could create the X2 plot using:

temp <- data[complete.cases(data[c(1,3)]), c(1,3)]
ggplot(temp, aes(x = date, X2)) + geom_line()

I like to do this using dplyr and tidyr:

library(dplyr); library(tidyr)
temp <- data %>% select(date, X2) %>% drop_na()
ggplot(temp, aes(x = date, X2)) + geom_line()

enter image description here

To do this for all variables, here's an approach using dplyr and tidyr with purrr:

library(purrr); library(dplyr); library(tidyr)
plots <- data %>% 
  # Convert to long form and remove NA rows
  gather(var, value, -date) %>%
  drop_na() %>%

  # For each variable, nest all the available data
  group_by(var) %>%
  nest() %>%

  # Make a plot based on each nested data, where we'll use the
  #   data as the first parameter (.x), and var as the second
  #   parameter (.y), feeding those into ggplot.
  mutate(plot = map2(data, var, 
                     ~ggplot(data = .x, aes(date, value)) +
                       geom_line() +
                       labs(title = .y, y = .y)))

# At this point we have a nested table, with data and plots for each variable:
plots
# A tibble: 3 x 3
  var   data             plot    
  <chr> <list>           <list>  
1 X1    <tibble [5 x 2]> <S3: gg>
2 X2    <tibble [3 x 2]> <S3: gg>
3 X3    <tibble [5 x 2]> <S3: gg>

# To make this like the OP, we can extract just the plots part, with
plots <- plots %>% pluck("plot")
plots

plots[[1]]
plots[[2]] # or use `plots %>% pluck(2)`
plots[[3]]

enter image description here enter image description here enter image description here