4
votes

I'm working with hydrologic data, and I need water years. I've successfully created a water year column based on a function previously posted on this forum. What I would like is to have an accompanying water day ranging from 1-365 (366 in leap years) to match the water year sequence. Note the water year is designated by the calendar year in which it ends. For example, the 2010 water year started on October 1, 2009 and ended on September 30, 2010.

Basically I want a column that starts a sequence from 1-365 (366 in leap years) whenever the date hits the first of October in each year.

My dataset ranges from 1950 to 2099, so I need someway to automate this process. The leap year component of this problem is giving me trouble.

I have provided code below to create the dataset.

# Create a data_set with datetime, year, month, day, jd, wtr_yr

library(lubridate)

datetime <- seq(as.Date("1950/1/1"), as.Date("2099/12/31"), by = "day")
year <- year(datetime)
month <- month(datetime)
day <- day(datetime)
julian_day <- yday(datetime)

# make dataframe with all time components
datetime_dataframe <- data.frame(datetime, year, month, day, julian_day)

# Create function to derive water year. Credit goes to Caner and MrFlick.

wtr_yr <- function(dates, start_month=10) {
  # Convert dates into POSIXlt
  dates.posix = as.POSIXlt(dates)
  # Year offset
  offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
  # Water year
  adj.year = dates.posix$year + 1900 + offset
  # Return the water year
  adj.year
}

#  Use fn to add water year to the dataframe
datetime_dataframe$wtr_yr <-wtr_yr(datetime_dataframe$datetime)

head(datetime_dataframe)
2

2 Answers

1
votes

Here is one possible way for you. I created the date sequence from 1949-10-01. I added water.year first. Whichever rows having month == 9 and day == 30 are flagged and marked as 1. The rest become 0. Using this column, I created a grouping variable for water year. cumsum(c(F, diff(water.year) < 0)) is doing the job. Once I had the grouping variable, I created the date sequence. I leave a part of the final outcome below.

library(dplyr)
library(lubridate)

mydata <- data.frame(date = seq(as.Date("1949-10-01"),as.Date("2099-12-31"), by = "day")) %>%
          mutate(year = year(date),
                 month = month(date),
                 day = day(date),
                 julian_day = yday(date),
                 water.year = if_else(month == 9 & day == 30, 1, 0))

mydata %>%
group_by(water.year = cumsum(c(F, diff(water.year) < 0))) %>%
mutate(water.day = 1:n())

         date year month day julian_day water.year water.day
1  1952-09-01 1952     9   1        245          2       337
2  1952-09-02 1952     9   2        246          2       338
3  1952-09-03 1952     9   3        247          2       339
28 1952-09-28 1952     9  28        272          2       364
29 1952-09-29 1952     9  29        273          2       365
30 1952-09-30 1952     9  30        274          2       366
1
votes

This should work. I just used difftime and noted you want wtr_day to be the number of days from 9-30 in the previous wtr_yr.

library(tidyverse)
new_df <- datetime_dataframe %>%
  group_by(wtr_yr) %>% 
  mutate(wtr_day = (as.integer(difftime(datetime,ymd(paste0(wtr_yr - 1 ,'-09-30')), units = "days"))))