0
votes

I have found a few options from other posts but I am having trouble reproducing the code for my specific needs.

I have climate data that I would like to find the average based on season from 1910-2015 for specific locations.

Here is an example of what I need: the average PPT for CENTROID_ID c1763_1 in 1911 for Winter (Month 12 of the preceding year [1910], month 1 and 2 of year in question [1911]), Spring (months 3,4,5 in 1911), Summer (months 6,7,8 in 1911) and Fall (months 9,10,11 in 1911). This would then need to be done for all individual centroid IDs for every year. I have over 400 unique CENTROID_IDs covering years 1910-2015.

I envision the new dataframe to have the columns CENTROID_ID, YEAR, WINT_PPT, SPR_PPT, SUM_PPT, FALL_PPT.

 CENTROID_ID    YEAR MONTH  PPT
1       c1763_1 1910     1  52.639
2       c1763_1 1910     2  20.870
3       c1763_1 1910     3  21.706
4       c1763_1 1910     4   9.347
5       c1763_1 1910     5   1.201
6       c1763_1 1910     6  11.267
7       c1763_1 1910     7  41.870
8       c1763_1 1910     8  61.260
9       c1763_1 1910     9  27.815
10      c1763_1 1910    10  67.377
11      c1763_1 1910    11  24.719
12      c1763_1 1910    12  30.212
13      c1763_1 1911     1  88.728
14      c1763_1 1911     2  50.035
15      c1763_1 1911     3  37.720
16      c1763_1 1911     4  12.831
17      c1763_1 1911     5   0.739
18      c1763_1 1911     6  18.198
19      c1763_1 1911     7  74.731
20      c1763_1 1911     8  40.873
21      c1763_1 1911     9  86.340
22      c1763_1 1911    10  36.423
23      c1763_1 1911    11  12.491
24      c1763_1 1911    12  19.428
25      c1763_1 1912     1  11.010
26      c1763_1 1912     2  16.339
27      c1763_1 1912     3  72.017
28      c1763_1 1912     4  25.887
29      c1763_1 1912     5   5.314
30      c1763_1 1912     6   8.595
31      c1763_1 1912     7  47.781
32      c1763_1 1912     8  51.188
33      c1763_1 1912     9  10.931
34      c1763_1 1912    10 119.725
35      c1763_1 1912    11  10.420
36      c1763_1 1912    12   8.777
37      c1763_1 1913     1  27.771
38      c1763_1 1913     2  62.622
39      c1763_1 1913     3  17.533
40      c1763_1 1913     4   8.008
41      c1763_1 1913     5   1.423
42      c1763_1 1913     6   3.773
43      c1763_1 1913     7  42.982
44      c1763_1 1913     8  40.541
45      c1763_1 1913     9  58.495
46      c1763_1 1913    10  22.729
47      c1763_1 1913    11  48.130
48      c1763_1 1913    12  32.049
49      c1763_1 1914     1 104.197
50      c1763_1 1914     2  31.707
3
The title of your post asks about MAX and MIN, but your problem statement doesn't mention them.David T
Sorry, I forgot to edit. I realized that maybe that was too much for one question and to start with means first then try to figure out a max and min on my own if I could with the help of this question.TaraBBB

3 Answers

1
votes

You can create an auxiliary column of x$YEAR + x$MONTH/12 which helps bringing December for last year to current year. And then simply use aggregate of PPT over CENTROID_ID, floor(aux) and floor(aux %% 1 * 4). Then reshape to bring the seasons in the same row.

x$aux <- x$YEAR + x$MONTH/12
y <- aggregate(PPT ~ CENTROID_ID + cbind(YEAR=floor(aux)) + cbind(SEASON=c("WINT",
 "SPR", "SUM", "FALL")[1+floor(aux %% 1 * 4)]), x, mean)
reshape(do.call(data.frame, y), v.names = "PPT", timevar = "SEASON", idvar = 
 c("CENTROID_ID", "YEAR"), direction = "wide")
#   CENTROID_ID YEAR PPT.FALL  PPT.SPR  PPT.SUM PPT.WINT
#1      c1763_1 1910 39.97033 10.75133 38.13233 36.75450
#2      c1763_1 1911 45.08467 17.09667 44.60067 56.32500
#3      c1763_1 1912 47.02533 34.40600 35.85467 15.59233
#4      c1763_1 1913 43.11800  8.98800 29.09867 33.05667
#17     c1763_1 1914       NA       NA       NA 55.98433
1
votes

Is this it?

library(dplyr)
library(tibble)

# make fake data
dates <- expand.grid(1910:1950, 1:12)
dates <- dates[order(dates$Var1), ]
data <- tibble(
  CENTROID_ID = rep("c1763_1", 240),
  YEAR = dates$Var1[1:240],
  MONTH = dates$Var2[1:240],
  PPT = runif(min = 1, max = 100, n = 240)
)

And then we can determine the SEASON based on YEAR and MONTH, group and calculate the mean per group:

data <- data %>%
  mutate(SEASON = case_when(
    MONTH == 12 | MONTH == 1 | MONTH == 2 ~ "WINTER",
    MONTH == 3 | MONTH == 4 | MONTH == 5 ~ "SPRING",
    MONTH == 6 | MONTH == 7 | MONTH == 8 ~ "SUMMER",
    MONTH == 9 | MONTH == 10 | MONTH == 11 ~ "AUTUMN",
  )) %>%
  group_by(CENTROID_ID, YEAR, SEASON) %>%
  summarise(PPT_AVG = mean(PPT))

With this result:

> data
# A tibble: 80 x 4
# Groups:   CENTROID_ID, YEAR [20]
   CENTROID_ID  YEAR SEASON PPT_AVG
   <chr>       <int> <chr>    <dbl>
 1 c1763_1      1910 AUTUMN    35.7
 2 c1763_1      1910 SPRING    44.3
 3 c1763_1      1910 SUMMER    63.3
 4 c1763_1      1910 WINTER    37.1
 5 c1763_1      1911 AUTUMN    40.7
 6 c1763_1      1911 SPRING    52.3
 7 c1763_1      1911 SUMMER    36.7
 8 c1763_1      1911 WINTER    10.7
 9 c1763_1      1912 AUTUMN    45.4
10 c1763_1      1912 SPRING    45.7
# ... with 70 more rows

UPDATE To shift the year we need to introduce lead(YEAR, 1). Updated the code and included SEASON as a factor with defined levels such that it sorts in a decent chronological order.

data <- data %>%
  mutate(SEASON = case_when(
    MONTH == 12 | MONTH == 1 |MONTH == 2 ~ "WINTER",
    MONTH == 3 | MONTH == 4 | MONTH == 5 ~ "SPRING",
    MONTH == 6 | MONTH == 7 | MONTH == 8 ~ "SUMMER",
    MONTH == 9 | MONTH == 10 | MONTH == 11~ "AUTUMN",
  )) %>%
  mutate(SEASON = factor(SEASON, levels = c("WINTER", "SPRING", "SUMMER", "AUTUMN"))) %>%
  mutate(YEAR_LEAD = lead(YEAR, 1)) %>%
  group_by(CENTROID_ID, YEAR_LEAD, SEASON) %>%
  summarise(PPT_AVG = mean(PPT),
            PPT_MIN = min(PPT),
            PPT_MAX = max(PPT)) 

With this result:

> data
# A tibble: 81 x 6
# Groups:   CENTROID_ID, YEAR_LEAD [21]
   CENTROID_ID YEAR_LEAD SEASON PPT_AVG PPT_MIN PPT_MAX
   <chr>           <int> <fct>    <dbl>   <dbl>   <dbl>
 1 c1763_1          1910 WINTER    83.5   81.4     85.7
 2 c1763_1          1910 SPRING    72.3   52.7     96.0
 3 c1763_1          1910 SUMMER    49.9   10.9     90.0
 4 c1763_1          1910 AUTUMN    26.4    7.17    63.1
 5 c1763_1          1911 WINTER    60.9   19.0     92.6
 6 c1763_1          1911 SPRING    62.9   58.6     67.4
 7 c1763_1          1911 SUMMER    49.2   23.7     76.4
 8 c1763_1          1911 AUTUMN    43.9   15.1     84.4
 9 c1763_1          1912 WINTER    38.5   18.4     67.9
10 c1763_1          1912 SPRING    72.1   53.4     93.9
# ... with 71 more rows
0
votes

Simply assign a SEASON column then aggregate:

df <- within(df, {
          SEASON <- ifelse(MONTH %in% c(12, 1, 2), 'Winter',
                           ifelse(MONTH %in% c(3, 4, 5), 'Spring',
                                  ifelse (MONTH %in% c(6, 7, 8), 'Summer', 
                                          ifelse(MONTH %in% c(9, 10, 11), 'Fall', NA)
                                  )
                          )
                     )

          YEAR <- ifelse(MONTH == 12, YEAR + 1, YEAR)
      })

agg_df <- aggregate(PPT ~ CENTROID_ID + SEASON, df, FUN=mean)
agg_df
#   CENTROID_ID SEASON      PPT
# 1     c1763_1   Fall 43.79958
# 2     c1763_1 Spring 17.81050
# 3     c1763_1 Summer 36.92158
# 4     c1763_1 Winter 39.74171

Should you need multiple aggregations of PPT:

agg_raw <- aggregate(PPT ~ CENTROID_ID + SEASON, df, 
                     FUN=function(x) c(min=min(x), mean=mean(x), max=max(x)))

agg_df <- do.call(data.frame, agg_raw)
agg_df

#   CENTROID_ID SEASON PPT.min PPT.mean PPT.max
# 1     c1763_1   Fall  10.420 43.79958 119.725
# 2     c1763_1 Spring   0.739 17.81050  72.017
# 3     c1763_1 Summer   3.773 36.92158  74.731
# 4     c1763_1 Winter   8.777 39.74171 104.197