0
votes

I have the following dataset, and I would like to have the average date (Month and day) for each (phenology) pheno and station across years. It seems I can directly use the mean function to calculate the mean for the date format objects. However, if I convert the month day to date, with function as.Date, then the year is added, and the average date is not independent of years. How can I directly calculate the mean date only based on Month and day?

enter image description here

2
Could you share your dataset in this question using 'dput()' and your expected outcome? - jhyeon

2 Answers

0
votes

You cannot compute a "mean month + day" independet of the year, since not every year has the same number of days. So you need to choose a fixed year for your computations.

Then you can:

  1. Create "dummy" date objects which have the correct month and day, but the previously select year.
  2. Compute the mean of those dummies
  3. Extract month and day from the result (remove the year)
0
votes

You can use the yday function from the lubridate package to convert each date into the day of the year for that year then average the day of the year for each Pheno. The conversion of the day of the year to a month and day depends upon whether your want the date in a leap year or non leap year. I report both dates.

The code looks like:

    library(tidyverse)
    library(lubridate)
#
#    calculate average day of year
#
    average_doy <- df  %>% mutate(day_of_year = yday(as.Date(paste(Year,Month,Day, sep="-")))) %>%
               group_by(Pheno) %>%
               summarize(avg_doy = round(mean(day_of_year,0))) 
 #   set base years
     non_leap_year <- 2003
     leap_year <- 2004
 #
 #   convert day of year to average day using base years 
 #
      averages <- average_doy %>% 
             mutate(avg_non_leap_year_mon_day = paste(avg_doy, non_leap_year, sep="_") %>% 
                                                as.Date(format = "%j_%Y") %>%
                                                str_remove(paste0(non_leap_year,"-")),
                    avg_leap_year_mon_day = paste(avg_doy, leap_year, sep="_") %>%
                                            as.Date(format = "%j_%Y") %>%  
                                            str_remove(paste0(leap_year,"-") ))
          

Using the first seven rows of your data, this gives

# A tibble: 3 x 4
  Pheno         avg_doy avg_non_leap_year_mon_day avg_leap_year_mon_day
  <chr>           <dbl> <chr>                     <chr>                
1 Dormant           348 12-14                     12-13                
2 Tillering         343 12-09                     12-08                
3 Turning green      48 02-17                     02-17