3
votes

I have a dataframe df with 3 columns (months, year, value).

>head(df)
        months year   value
      January  01 23875.00
     February  01 15343.25
        March  01  9584.25
        April  01 19026.33
          May  01 26324.00
         June  01 31228.00

Every 12 rows (starting from the first January), the year goes 02, 03, 04, etc.. until 16. I need to calculate seasonal means i.e. For Summer mean of (December,January,February); for Autumn mean of (March,April,May), for Winter mean of (June,July,August) and for Spring mean of (September,October,November).

Then make a new dataframe with seasons, year, and the mean value of them to get something like this.

>head(seasdf)
season year value
 DJF   01    
 MAM   01    
 JJA   01    
 SON   01    
 DJF   02    
 MAM   02    

With all the years until 16. I searched for similar questions with this kind of dataframe, but i couldn't find a way to do it.

Sorry for this noob question.

2

2 Answers

3
votes

We assume that adjacent months in the same quarter should all have the same quarter name and year and that quarters are named after the year in which the quarter ends. For example, Dec 2001, Jan 2002 and Feb 2002 would all be part of the DJF 2002 quarter.

First convert the year and month to a "yearmon" class variable, ym, and then add 1/12 to push the months forward one. This is based on the fact that yearmon variables are stored as the year + 0 for Jan, 1/12 for Feb, 2/12 for Mar, etc. Then convert that to a "yearqtr" class variable, yq. Now aggregate value by yq noting that yearqtr variables sort correctly so that 2001 Q1 will come before 2001 Q2, etc. Finally reconstitute the aggregated data frame with the columns shown in the question.

library(zoo)  # yearmon and yearqtr classes

ym <- as.yearmon(paste(DF$months, DF$year), "%B %y")
yq <- as.yearqtr(ym + 1/12)

Ag <- aggregate(value ~ yq, DF, mean)

season.name <- c("DJF", "MAM", "JJA", "SON")
with(Ag, data.frame(year = as.integer(yq), season = season.name[cycle(yq)], value))

giving:

  year season    value
1 2001    DJF 19609.12
2 2001    MAM 18311.53
3 2001    JJA 31228.00

If the exact layout shown in the question is not important then we could omit the last two lines of code above and just use Ag

> Ag
       yq    value
1 2001 Q1 19609.12
2 2001 Q2 18311.53
3 2001 Q3 31228.00

Note: The input DF in reproducible form was assumed to be:

DF <- structure(list(months = c("January", "February", "March", "April", 
"May", "June"), year = c("01", "01", "01", "01", "01", "01"), 
   value = c(23875, 15343.25, 9584.25, 19026.33, 26324, 31228
   )), .Names = c("months", "year", "value"), class = "data.frame", row.names = c(NA, -6L))
2
votes

Seems like your months variable is standard month name, you can match it against the month.name variable in R to get the month as a number, i.e(January will be 1, February will 2, etc), and take modulo division of 3 to get the season as another group variable aside from year, and then it should be trivial to group by year, season and take the average:

library(dplyr)
df %>% group_by(season = match(months, month.name) %% 12 %/% 3, year) %>% 
       summarise(value = mean(value)) %>% ungroup() %>% 

       # optional: convert the season from number to meaningful labels which could also be 
       # summer, autumn, winter and spring
       mutate(season = factor(season, levels = c(0,1,2,3), 
                                      labels = c("DJF", "MAM", "JJA", "SON")))

# A tibble: 3 × 3
#  season  year    value
#  <fctr> <int>    <dbl>
#1    DJF     1 19609.12
#2    MAM     1 18311.53
#3    JJA     1 31228.00

If December needs to be rolled to the next year Summer, you can add one to the year variable when months == "December":

df %>% group_by(season = match(months, month.name) %% 12 %/% 3, year = ifelse(months == "December", year + 1, year)) %>% 
       summarise(value = mean(value)) %>% ungroup() %>% 

       # optional: convert the season from number to meaningful labels which could also be 
       # summer, autumn, winter and spring
       mutate(season = factor(season, levels = c(0,1,2,3), 
                              labels = c("DJF", "MAM", "JJA", "SON")))