0
votes

I'm working with a time serie of a good number of stations and info on cloud cover. I have no values for year 1996 to 1998. When I plot my time serie, I end up having a graph were there is none of these years on the x axis (no gap in between 1995 to 1999), thus not showing the "hole" in the dataset. I'd like to be able to plot the dataset with this "hole".

This is an example of the dataset (for 1 station) I use (103 different stations from 1978-2020 (except 1996-1997-1998) :

list(year = c("1978", "1979", "1980", "1981", "1982", "1983", 
"1984", "1985", "1986", "1987", "1988", "1989", "1990", "1991", 
"1992", "1993", "1994", "1995", "1999", "2000", "2001", "2002", 
"2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", 
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", 
"2019", "2020"), station = c("CYKL", "CYKL", "CYKL", "CYKL", 
"CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", 
"CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", 
"CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", 
"CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", "CYKL", 
"CYKL", "CYKL", "CYKL", "CYKL"), yearly_mean = c(6.66913370615075, 
6.6097058904658, 6.63321092700948, 6.70511418186491, 6.54047991265942, 
6.6260569776694, 6.50415800515247, 6.24652925009775, 6.27738517719698, 
6.47124546258661, 6.27052514197829, 6.26910246690519, 6.05107795039979, 
6.23566751055814, 6.35965195833086, 6.11188147454036, 6.33670894555142, 
6.94058230299598, 6.73775601517714, 6.51150716559123, 6.56145315627824, 
6.65851518241965, 6.46056518559336, 7.14419354616137, 7.10972161555151, 
7.17952990504808, 6.09962477099579, 5.80124120310416, 6.33046468099031, 
6.31358338950744, 5.83258214159217, 6.45865945359556, 6.30835362857386, 
5.838640337478, 6.27950436041085, 6.33646147722408, 6.18548111344106, 
6.26207275160055, 6.470022441835, 6.28689494938971))

So far this is my code :

py <- all_y_means %>%
 ggplot(aes(x=year, y=yearly_mean, group = station)) +
 geom_line(aes(color=station)) +
 
py + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +  theme(legend.position="none")

Can anyone help?

Thanks!

2
That's interesting data. Do you normally work on lists (not data.frames), or did you change the output from dput for a reason? (And I was right, your year is a string, seems counter-intuitive to operate on years as a categorical variable.)r2evans
Strangely, I didn't change the output of dput, but in my environment it is a df... The thing is I have started from a big dataset with hourly observations and aggregated into daily, monthly, yearly, and seasonal observations. Every time an aggregation is made the date, year, atc. variable are transformed into characters and it's hard to keep count on what is what. I'll change it and try the coades you send me again, thanks.Will_8011
"transformed into characters" should be an alarm bell that aggregation is being done in a harmful way. No aggregation should require that number-like fields be converted into strings (or factors), at least not permanently (and I'm not talking about using as.numeric, that's a hack). FYI, dput's output for frames starts with structure( and should have at least two more named entries after that list you pasted: row.names and class="data.frame" (optionally with "tbl_df" or "data.table").r2evans
Well, I have such little experience with this, I have still a lot to learn! I appreciate your feedback (very respectful even if I am quite noob) and I want to thank you for your time. I will try to work on a couple aspects of this and will try to post better structured questions in the future. Thanks again and have a great day!Will_8011

2 Answers

1
votes

You could probably complete the x-axis of the plot by expanding the data.frame with the missing years. Assuming your dataframe is called df, you might do:

library(tidyverse)
df969798 <- data.frame(year = c(1996:1998))
df <- bind_rows(df, df969798) %>% arrange(year)

Make sure the year column is of the same class, probably numeric or integer, as in df. The other variables, like station and yearly_mean, will then be filled with NAs. Then you can use the same code for your plot.

1
votes

You can use tidyr::complete to fill out the missing components.

Random data:

set.seed(2021)
n <- 12
random_data <- data.frame(
  year = rep(seq(1990, length.out = n), times = 2),
  station = rep(c("CWBY", "CWNH"), each = n),
  yearly_mean = runif(n*2, min = 1, max = 10)
)
random_data <- random_data[ -c(5,6), ]

Flawed image:

ggplot(random_data, aes(year, yearly_mean, group = station)) +
  geom_line(aes(color = station)) +
  geom_point(aes(color = station))

flawed ggplot

Fixed image, using tidyr::complete:

tidyr::complete(random_data, station, year)
# # A tibble: 24 x 3
#    station  year yearly_mean
#    <chr>   <dbl>       <dbl>
#  1 CWBY     1990        5.06
#  2 CWBY     1991        8.05
#  3 CWBY     1992        7.39
#  4 CWBY     1993        4.44
#  5 CWBY     1994       NA   
#  6 CWBY     1995       NA   
#  7 CWBY     1996        6.76
#  8 CWBY     1997        3.40
#  9 CWBY     1998        8.34
# 10 CWBY     1999        9.85
# # ... with 14 more rows

tidyr::complete(random_data, station, year) %>%
  ggplot(aes(year, yearly_mean, group = station)) +
  geom_line(aes(color = station)) +
  geom_point(aes(color = station))
# Warning: Removed 2 rows containing missing values (geom_point).

fixed ggplot

This can also be done in base R without tidyr:

eg <- expand.grid(year = unique(random_data$year),
                  station = unique(random_data$station))
ggplot(merge(random_data, eg, all = TRUE),
       aes(year, yearly_mean, group = station)) +
  geom_line(aes(color = station)) +
  geom_point(aes(color = station))