0
votes

I am trying to calculate the mean date independent of year for each level of a factor.

DF <- data.frame(Date = seq(as.Date("2013-2-15"), by = "day", length.out = 730))
DF$ID = rep(c("AAA", "BBB", "CCC"), length.out = 730)
head(DF)

        Date  ID
1 2013-02-15 AAA
2 2013-02-16 BBB
3 2013-02-17 CCC
4 2013-02-18 AAA
5 2013-02-19 BBB
6 2013-02-20 CCC

With the data above and the code below, I can calculate the mean date for each factor, but this includes the year.

I want a mean month and day across years. The preferred result would be a POSIXct time class formatted as month-day (eg. 12-31 for Dec 31st) representing the mean month and day across multiple years.

library(dplyr)
DF2 <- DF %>% group_by(ID) %>% mutate(
    Col = mean(Date, na.rm = T))
DF2

Addition I am looking for the mean day of the year with a month and day component, for each factor level. If the date represents, for example, the date an animal reproduced, I am not interested in the yearly differences between years, but instead want a single mean day.

I The end result would look like DF2 but with the new value calculated as previously described (mean day of the year with a month day component.

Sorry this was not more clear.

2
I can think of several different ways to define this. The mean date across the whole time span perhaps (it would have a year, month, and day component). Or the mean day of the year (It would have a month and a day component. Or the mean month, and then the mean day. - Mike Wise
Please show us the desired result. - Rich Scriven
I have added a few specifics per your request. - B. Davis
Well, with yday from data.table: DF %>% group_by(ID) %>% mutate(myday = mean(yday(Date))) I'm sure there's some analogue in the lubridate package. Convert back to "month and day" as you please. - Frank

2 Answers

3
votes

If I understand your question correctly, here's how to get a mean date column. I first extract the day of the year with yday from POSIXlt. I then calculate the mean. To get a date back, I have to add those days to an actual year, hence the creation of the Year object. As requested, I put the results in the same format as DF2 in your example.

library(dplyr)
DF2 <- DF %>%
mutate(Year=format(Date,"%Y"),
Date_day=as.POSIXlt(Date, origin = "1960-01-01")$yday)%>%
group_by(ID) %>%
mutate(Col = mean(Date_day, na.rm = T),Mean_date=format(as.Date(paste0(Year,"-01-01"))+Col,"%m-%d"))%>%
select(Date,ID,Mean_date)
DF2
> DF2
Source: local data frame [730 x 3]
Groups: ID [3]

         Date    ID Mean_date
       (date) (chr)     (chr)
1  2013-02-15   AAA     07-02
2  2013-02-16   BBB     07-02
3  2013-02-17   CCC     07-01
4  2013-02-18   AAA     07-02
5  2013-02-19   BBB     07-02
6  2013-02-20   CCC     07-01
7  2013-02-21   AAA     07-02
8  2013-02-22   BBB     07-02
9  2013-02-23   CCC     07-01
10 2013-02-24   AAA     07-02
..        ...   ...       ...
0
votes

You can take the mean of dates by using the mean function. However, note that the mean implementation (and result) will be different depending on the data type. For POSIXct, the mean will be calculated and return the date and time - think of taking the mean of a bunch of integers and you will likely get a float or numeric. For Date, it will essentially 'round' the date to the nearest date.

For example, I recently took a mean of dates. Look at the output when different data types are used.

> mean(as.Date(stationPointDf$knockInDate))
[1] "2018-06-04"
> mean(as.POSIXct(stationPointDf$knockInDate))
[1] "2018-06-03 21:19:21 CDT"

If I am looking for a mean Month and Day across years, I convert all the dates to have the current year using lubridate package.

library(lubridate)
year(myVectorOfDates) <- 2018

Then, I compute the mean and drop the year.