0
votes

version R version 4.0.5 (2021-03-31) os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United Kingdom.1252 ctype English_United Kingdom.1252 tz Europe/London
date 2021-08-08

Greetings everyone,

I am trying to calculate some variables in a data frame imported from excel but I am missing something which I don't seem to find. I imagine that it is a pretty specific situation as my search in youtube tutorials for "for loop", StackOverflow posts, and Google, in general, did not help so far. Thus, I thought of posting as my last resort to find a solution from more experience programmers.

I have a dataset with 1392 rows and multiple columns:

> summary(twente_1)
 Player.Number   Playing.Position       Date                Week      Training.type      Total.Distance 
 Min.   : 1.00   Length:3192        Length:3192        Min.   : 1.0   Length:3192        Min.   :    0  
 1st Qu.: 3.75   Class :character   Class :character   1st Qu.:10.0   Class :character   1st Qu.:    0  
 Median : 7.00   Mode  :character   Mode  :character   Median :19.5   Mode  :character   Median : 3669  
 Mean   : 9.25                                         Mean   :19.5                      Mean   : 3757  
 3rd Qu.:15.50                                         3rd Qu.:29.0                      3rd Qu.: 5500  
 Max.   :19.00                                         Max.   :38.0                      Max.   :19226  
 NA's   :3180                                          NA's   :2736                                     
      HSR             SD         High.Intensity.Actions..acc.dec.  Player.Load     SUM.Weekly.Total.Distance
 Min.   :   0   Min.   :  0.00   Min.   : 0.00                    Min.   :   0.0   Min.   :    0            
 1st Qu.:   0   1st Qu.:  0.00   1st Qu.: 0.00                    1st Qu.:   0.0   1st Qu.:21304            
 Median :  22   Median :  0.00   Median :12.00                    Median :  89.0   Median :27969            
 Mean   : 123   Mean   : 23.21   Mean   :15.18                    Mean   : 191.2   Mean   :26298            
 3rd Qu.: 168   3rd Qu.: 20.00   3rd Qu.:24.00                    3rd Qu.: 240.0   3rd Qu.:32727            
 Max.   :1590   Max.   :475.00   Max.   :90.00                    Max.   :1777.0   Max.   :50194            
                                                                                   NA's   :2736             
    SUM.HSR           SUM.SD        SUM.ACC.DEC    SUM.Player.Load Daily.Mean     St.Deviation  
 Min.   :   0.0   Min.   :  0.00   Min.   :  0.0   Min.   :   0    Mode:logical   Mode:logical  
 1st Qu.: 552.0   1st Qu.: 57.25   1st Qu.: 67.0   1st Qu.: 876    NA's:3192      NA's:3192     
 Median : 843.5   Median :142.00   Median :104.0   Median :1318                                 
 Mean   : 861.0   Mean   :162.50   Mean   :106.3   Mean   :1339                                 
 3rd Qu.:1164.2   3rd Qu.:235.00   3rd Qu.:147.0   3rd Qu.:1799                                 
 Max.   :3504.0   Max.   :711.00   Max.   :259.0   Max.   :3373                                 
 NA's   :2736     NA's   :2736     NA's   :2736    NA's   :2736                                 
 Monotony.Total.Distance Monotony.HSR   Monotony.SD    Monotony.High.Intensity.Actions Monotony.Player.Load
 Mode:logical            Mode:logical   Mode:logical   Mode:logical                    Mode:logical        
 NA's:3192               NA's:3192      NA's:3192      NA's:3192                       NA's:3192           
                                                                                                           
                                                                                                           
                                                                                                           
                                                                                                           
                                                                                                           
 Strain.Total.Distance Strain.HSR     Strain.SD      Strain.High.Intensity.Actions Strain.Player.Load
 Mode:logical          Mode:logical   Mode:logical   Mode:logical                  Mode:logical      
 NA's:3192             NA's:3192      NA's:3192      NA's:3192                     NA's:3192
> head(twente_1)
  Player.Number Playing.Position       Date Week Training.type Total.Distance HSR  SD
1             1               ED 11/08/2018    1         'OFF'              0   0   0
2            NA                  12/08/2018   NA         'OFF'              0   0   0
3            NA                  13/08/2018   NA          'TT'           4599  72   0
4            NA                  14/08/2018   NA          'TT'           6328 213 104
5            NA                  15/08/2018   NA          'TT'           5522 264  22
6            NA                  16/08/2018   NA          'TT'           2873  14   0
  High.Intensity.Actions..acc.dec. Player.Load SUM.Weekly.Total.Distance SUM.HSR SUM.SD SUM.ACC.DEC
1                                0           0                     31953    1205    298         113
2                                0           0                        NA      NA     NA          NA
3                               16         141                        NA      NA     NA          NA
4                               25         362                        NA      NA     NA          NA
5                               15         283                        NA      NA     NA          NA
6                               16          66                        NA      NA     NA          NA
  SUM.Player.Load Daily.Mean St.Deviation Monotony.Total.Distance Monotony.HSR Monotony.SD
1            1843         NA           NA                      NA           NA          NA
2              NA         NA           NA                      NA           NA          NA
3              NA         NA           NA                      NA           NA          NA
4              NA         NA           NA                      NA           NA          NA
5              NA         NA           NA                      NA           NA          NA
6              NA         NA           NA                      NA           NA          NA
  Monotony.High.Intensity.Actions Monotony.Player.Load Strain.Total.Distance Strain.HSR Strain.SD
1                              NA                   NA                    NA         NA        NA
2                              NA                   NA                    NA         NA        NA
3                              NA                   NA                    NA         NA        NA
4                              NA                   NA                    NA         NA        NA
5                              NA                   NA                    NA         NA        NA
6                              NA                   NA                    NA         NA        NA
  Strain.High.Intensity.Actions Strain.Player.Load player_load_sd
1                            NA                 NA             NA
2                            NA                 NA             NA
3                            NA                 NA             NA
4                            NA                 NA             NA
5                            NA                 NA             NA
6                            NA                 NA             NA

I want to create calculate some new variables and store them in specific cells. For example, I want to find the Standard Deviation of each week (a total of 1392 rows, that is 456 weeks).

I "figured out" the code to do it manually:

twente_1$player_load_sd[1] = sd(twente_1$Player.Load[1:7])
twente_1$player_load_sd[2] = sd(twente_1$Player.Load[8:14])
twente_1$player_load_sd[3] = sd(twente_1$Player.Load[15:21])
twente_1$player_load_sd[4] = sd(twente_1$Player.Load[22:28])
twente_1$player_load_sd[5] = sd(twente_1$Player.Load[29:35])
twente_1$player_load_sd[6] = sd(twente_1$Player.Load[36:42])
twente_1$player_load_sd[7] = sd(twente_1$Player.Load[43:49])
twente_1$player_load_sd[8] = sd(twente_1$Player.Load[50:56])
twente_1$player_load_sd[9] = sd(twente_1$Player.Load[57:63])
twente_1$player_load_sd[10] = sd(twente_1$Player.Load[64:70])

I am sure that I can do that with a "for loop" but I cannot succeed doing it. I have tried the code below, but it gives me NAs:

x <- 1
y <- 7
for (i in 1:456) {
        twente_1$player_load_sd[i] = sd(twente_1$Player.Load[x:y])
        x <- x+7
        y <- y+7
}

Thank you in advance for your time and help.

1

1 Answers

0
votes

Instead of a for loop, I would 1) create a week variable, 2) group the dataset by week, and 3) calculate the SD for each week using the grouped dataset. Here's what that would look like:

Here's a sample dataset consisting of 10 weeks of data. (Install the tidyverse library if you don't have it already.)

library(tidyverse)
df <- tibble(
  day = 1:70,
  x = runif(70, 0, 100)
)

First, let's create a week variable by putting rows into groups of 7.

df <- 
  df %>% 
  mutate(
    week = rep(1:(nrow(df)/7), each = 7)
  )

Next, group the dataset by week and calculate the SD of x. Don't forget to ungroup at the end!

df <- 
  df %>% 
  group_by(week) %>% 
  mutate(week_sd = sd(x)) %>% 
  ungroup()

We can look at the first 14 days (i.e., two weeks) to see that the SD for each week is saved in each individual row.

head(df, 14)

If you instead want a new dataset with one row per week, you can group and summarize instead:

df_week <- 
  df %>% 
  group_by(week) %>% 
  summarize(week_sd = sd(x)) %>% 
  ungroup()

df_week