Create day index based on water year

Question

I'm working with hydrologic data, and I need water years. I've successfully created a water year column based on a function previously posted on this forum. What I would like is to have an accompanying water day ranging from 1-365 (366 in leap years) to match the water year sequence. Note the water year is designated by the calendar year in which it ends. For example, the 2010 water year started on October 1, 2009 and ended on September 30, 2010.

Basically I want a column that starts a sequence from 1-365 (366 in leap years) whenever the date hits the first of October in each year.

My dataset ranges from 1950 to 2099, so I need someway to automate this process. The leap year component of this problem is giving me trouble.

I have provided code below to create the dataset.

# Create a data_set with datetime, year, month, day, jd, wtr_yr

library(lubridate)

datetime <- seq(as.Date("1950/1/1"), as.Date("2099/12/31"), by = "day")
year <- year(datetime)
month <- month(datetime)
day <- day(datetime)
julian_day <- yday(datetime)

# make dataframe with all time components
datetime_dataframe <- data.frame(datetime, year, month, day, julian_day)

# Create function to derive water year. Credit goes to Caner and MrFlick.

wtr_yr <- function(dates, start_month=10) {
  # Convert dates into POSIXlt
  dates.posix = as.POSIXlt(dates)
  # Year offset
  offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
  # Water year
  adj.year = dates.posix$year + 1900 + offset
  # Return the water year
  adj.year
}

#  Use fn to add water year to the dataframe
datetime_dataframe$wtr_yr <-wtr_yr(datetime_dataframe$datetime)

head(datetime_dataframe)

jazzurro jazzurro · Accepted Answer · 2018-01-06T02:34:58

Here is one possible way for you. I created the date sequence from 1949-10-01. I added water.year first. Whichever rows having month == 9 and day == 30 are flagged and marked as 1. The rest become 0. Using this column, I created a grouping variable for water year. cumsum(c(F, diff(water.year) < 0)) is doing the job. Once I had the grouping variable, I created the date sequence. I leave a part of the final outcome below.

library(dplyr)
library(lubridate)

mydata <- data.frame(date = seq(as.Date("1949-10-01"),as.Date("2099-12-31"), by = "day")) %>%
          mutate(year = year(date),
                 month = month(date),
                 day = day(date),
                 julian_day = yday(date),
                 water.year = if_else(month == 9 & day == 30, 1, 0))

mydata %>%
group_by(water.year = cumsum(c(F, diff(water.year) < 0))) %>%
mutate(water.day = 1:n())

         date year month day julian_day water.year water.day
1  1952-09-01 1952     9   1        245          2       337
2  1952-09-02 1952     9   2        246          2       338
3  1952-09-03 1952     9   3        247          2       339
28 1952-09-28 1952     9  28        272          2       364
29 1952-09-29 1952     9  29        273          2       365
30 1952-09-30 1952     9  30        274          2       366

Create day index based on water year

2 Answers