R: calculating mean and sd for each week of the year

Question

I have 4 data frames, each data frame corresponding to a single year. Each data frame contains daily rainfall for five locations.

Generate sample data

    location <- c("A","B","C","D","E")
    mat <- round(as.data.frame(matrix(runif(1825),nrow=5,ncol=365)), digits=2)
    dat.1981 <-as.data.frame(cbind(location,mat)) # rainfall for 1981
    dat.1981$year <- 1981

    mat <- round(as.data.frame(matrix(runif(1825),nrow=5,ncol=365)), digits = 2)
    dat.1982 <- as.data.frame(cbind(location,mat)) # rainfall for 1982
    dat.1982$year <- 1982

    mat <- round(as.data.frame(matrix(runif(1825),nrow=5,ncol=365)), digits = 2)
    dat.1983 <-as.data.frame(cbind(location,mat)) # rainfall for 1983
    dat.1983$year <- 1983

    mat <- round(as.data.frame(matrix(runif(1825),nrow=5,ncol=365)), digits = 2)
    dat.1984 <-as.data.frame(cbind(location,mat)) # rainfall for 1984
    dat.1984$year <- 1984

    dat <- as.data.frame(rbind(dat.1981,dat.1982,dat.1983,dat.1984))

For each year, I want to classify whether a day was an extreme wet day or not

Here's how I do my calculation:

1) For each location, generate the mean and sd of rainfall for every week for the period 1981 to 1984. For example, in location A the mean rainfall for the first week will be:

(First week rain 1981 in A + First week rain 1982 in A + First week rain 1983 in A + First week rain 1984 in A)/4

which can be written in R as

    mean.week1.loc1 <- mean(rowSums(dat[dat$location=="A",2:8])) # 2:8 selects the first 7 days in each year
    sd.week1.loc1 <- sd(rowSums(dat[dat$location=="A",2:8])) 

    wet.cr <- mean.week1 + sd.week1 # this is my threshold for defining a wet day

If daily rainfall in week 1 for the years 1981 to 1984 in location A is greater than wet.cr, that day is a wet day and hence gets a value of 1

As an example, to examine whether rainfall of week 1 for location A for 1981 to 1984 is a wet day or not I can do the following:

   lapply(dat[, 2:8], function(x) ifelse(x > wet.cr, 1, 0))

I want to repeat this for each week and each location.

However, I am unable to stitch these individual functions together and also my final results should be a dataframe same as dat but instead of rainfall values, I will have 1 or 0 defining whether it is a wet day or not.

EDIT

The solutions below gives me the following:

mean(c(rainfall 1981 day 1 week 1, ...., rainfall 1981 day 7 week 1, rainfall 1982 day 1 week 1,....,rainfall 1982 day 7 week 1,....,rainfall 1984 day 1 week 1,....,rainfall 1984 day 7 week 1))

WHAT I WANT IS:

mean(c(mean(total rainfall week 1 1981), mean(total rainfall week 1 1982), mean(total rainfall week 1 1983), mean(total rainfall week 1 1984)))

I hope this is clear now.

This seems harder than it needs to be. If you have everything by date, use ISOweek to grab the year and week, then just aggregate over year in tidyr. If you give an example of the data with full dates instead of broken down in columns by week, I can show you.... — sconfluentus
I doubt that this kind of data is stored as shown, since not every year has 365 days. If you had dates, you could transform them to long, add a column which denotes week of year, group by that and get the mean and sd which you can compare the value to... Does the data exist as I expect or is it really as shown? — Tino
Notably, 1984 has 366 days. Which one is missing from your data frame? — Gregor Thomas
I have deleted the extra days in leap years so that all years have the same 365 days — 89_Simple
@Tino this is how I have the data as. I can try to manipulate it in the way that you suggested. — 89_Simple

Tung Tung · Accepted Answer · 2018-01-09T18:24:44

A tidyverse solution

    library(magrittr)
    library(tidyverse)

    dat_m <- gather(dat, day, rainfall, -location, -year)
    str(dat_m)

    dat_m %<>%
      mutate(day = gsub("V", "", day)) %>%
      mutate(day = as.numeric(day)) %>% 
      mutate(week = as.integer(ceiling(day/7))) %>% 
      group_by(location, week) %>% 
      mutate(wet.cr = mean(rainfall, na.rm = TRUE) + sd(rainfall, na.rm = TRUE) ) %>% 
      mutate(indication = ifelse(rainfall > wet.cr, 1, 0)) %>% 
      ungroup()
    dat_m 

    # A tibble: 7,300 x 7
       location  year   day rainfall  week wet.cr indication
       <fctr>   <dbl> <dbl>    <dbl> <int>  <dbl>      <dbl>
     1 A         1981  1.00    0.880     1  0.845       1.00
     2 B         1981  1.00    0.850     1  0.829       1.00
     3 C         1981  1.00    1.00      1  0.877       1.00
     4 D         1981  1.00    0.100     1  0.755       0   
     5 E         1981  1.00    0.190     1  0.750       0   
     6 A         1982  1.00    0.380     1  0.845       0   
     7 B         1982  1.00    0.760     1  0.829       0   
     8 C         1982  1.00    0.940     1  0.877       1.00
     9 D         1982  1.00    0.900     1  0.755       1.00
    10 E         1982  1.00    0.600     1  0.750       0   
    # ... with 7,290 more rows

Edit: For rainfall, I think it's better to use sum (total) than mean

So we first calculate the total weekly rainfall for every year. Then we calculate the long-term average & stdev of the total weekly rainfall.

    dat_m %<>%
      mutate(day = as.numeric(gsub("V", "", day)),
             week = as.integer(ceiling(day/7))) %>%
      group_by(location, week, year) %>% 
      mutate(total_weekly_rainfall = sum(rainfall, na.rm = TRUE)) %>% 
      ungroup() %>% 
      group_by(location, week) %>% 
      mutate(mean_weekly_rainfall = sum(rainfall, na.rm = TRUE)/length(unique(year)),
             stddev_weekly_rainfall = sd(rainfall, na.rm = TRUE),
             wet.cr =  mean_weekly_rainfall + stddev_weekly_rainfall,
             indication = ifelse(total_weekly_rainfall > wet.cr, 1, 0)) %>% 
      arrange(location, year, day) %>% 
      ungroup() %>% 
      distinct(location, year, week, .keep_all = TRUE)
    dat_m 

    # A tibble: 1,060 x 10
       location  year   day rainfall  week total_wee~ mean_wee~ stddev_w~ wet.~ indic~
       <fctr>   <dbl> <dbl>    <dbl> <int>      <dbl>     <dbl>     <dbl> <dbl>  <dbl>
     1 A         1981  1.00   0.880      1     0.880      0.630     0.277 0.907      0
     2 A         1981  8.00   0.190      2     0.190      0.330     0.431 0.761      0
     3 A         1981 15.0    0.630      3     0.630      0.548     0.331 0.878      0
     4 A         1981 22.0    0.0300     4     0.0300     0.290     0.259 0.549      0
     5 A         1981 29.0    0.360      5     0.360      0.308     0.196 0.504      0
     6 A         1981 36.0    0.540      6     0.540      0.500     0.225 0.725      0
     7 A         1981 43.0    0.0300     7     0.0300     0.375     0.289 0.664      0
     8 A         1981 50.0    0.170      8     0.170      0.332     0.375 0.708      0
     9 A         1981 57.0    0.260      9     0.260      0.652     0.272 0.924      0
    10 A         1981 64.0    0.590     10     0.590      0.512     0.202 0.715      0
    # ... with 1,050 more rows

R: calculating mean and sd for each week of the year

3 Answers