1
votes

I am trying to calculate rolling averages of Heart Rate over 15 second intervals. I have millisecond data for many participants and as such the millisecond values can potentially be repeated multiple times, and due to inconsistent time readings, creating intervals by row is not viable.

Below is a small sample of the data for one participant. Data for another participant would obviously feature different millisecond data taken at different intervals.

Ideal output would involve a new column with the rolling average for each value of millisecond data.

MS <- c(36148, 36753,37364,38062,38737,39580,40029,40387,41208,42006,42796, 43533,44274,44988,45696,46398,47079,47742,48429,49135,49861,50591,51324,52059)
HR <- c(84,84,84,84,84,96,84,84,96,84,84,96,84,84,96,84,84,84,84,84,84,84,84,84)

df <- data.frame(MS, HR)

I have tried a few packages (namely Zoo's suite of rolling functions) but have had trouble applying them to this problem.

Thank you!

4
Can you please add your expected output?sm925
Apologies for being unclear, and thanks for your help!KNichs
I have added edits which should clarify what I am looking for - if that is still unclear, I am looking for something pretty close to the answer provided below! The sample data is a short sample from a much longer series. And yes, the final value was an unfortunate typo.KNichs

4 Answers

2
votes

rollapplyr in zoo accepts a vector of widths and findInterval can be used to calculate the index in MS 15 seconds ago so if we subtract that from 1:n we get w, the number of positions to average. Exactly which intervals to produce is not discussed in the question so we will assumes that the right hand edge of each interval is at an input point.

library(zoo)

w <- with(df, seq_along(MS) - findInterval(MS - 15000, MS))
transform(df, roll = rollapplyr(HR, w, mean, fill = NA))
1
votes

An option using non-equi join in data.table which also handles an ID:

library(data.table)
setDT(df)[, avgHR := 
    df[.(ID=ID, start=MS-15000, end=MS), on=.(ID, MS>=start, MS<=end),
        by=.EACHI, mean(HR)]$V1
]

output:

    ID    MS HR    avgHR
 1:  1 36148 84 84.00000
 2:  1 36753 84 84.00000
 3:  1 37364 84 84.00000
 4:  1 38062 84 84.00000
 5:  1 38737 84 84.00000
 6:  1 39580 96 86.00000
 7:  1 40029 84 85.71429
 8:  1 40387 84 85.50000
 9:  1 41208 96 86.66667
10:  1 42006 84 86.40000
11:  1 42796 84 86.18182
12:  1 43533 96 87.00000
13:  1 44274 84 86.76923
14:  1 44988 84 86.57143
15:  1 45696 96 87.20000
16:  1 46398 84 87.00000
17:  1 47079 84 86.82353
18:  1 47742 84 86.66667
19:  1 48429 84 86.52632
20:  1 49135 84 86.40000
21:  1 49861 84 86.28571
22:  1 50591 84 86.18182
23:  1 51324 84 86.18182
24:  1 52059 84 86.18182
    ID    MS HR    avgHR

data:

MS <- c(36148, 36753,37364,38062,38737,39580,40029,40387,41208,42006,42796, 43533,44274,44988,45696,46398,47079,47742,48429,49135,49861,50591,51324,52059)
HR <- c(84,84,84,84,84,96,84,84,96,84,84,96,84,84,96,84,84,84,84,84,84,84,84,84)

df <- data.frame(ID=1, MS, HR)
0
votes

I'm not totally sure how you want to apply the 15s rolling average, but here is one way to go about what I think youre looking for. First we subset the data that is between 7.5s before and 7.5s after, then we take the average. This, however, will have an edge effect since there is no 7.5s before the first value.

library(tidyverse)

roll_vec <- c()
for(i in 1:nrow(df)){
  ref <- df$MS[[i]]
  val <- df %>%
    filter(MS <= ref + 7500 & MS >= ref- 7500) %>%
    pull(HR) %>%
    mean
  roll_vec[[i]] <- val
}


df %>%
  mutate(roll_15s = roll_vec) 
#>       MS HR roll_15s
#> 1  36148 84 87.00000
#> 2  36753 84 87.00000
#> 3  37364 84 86.76923
#> 4  38062 84 86.57143
#> 5  38737 84 86.57143
#> 6  39580 96 86.57143
#> 7  40029 84 86.57143
#> 8  40387 84 86.57143
#> 9  41208 96 86.57143
#> 10 42006 84 86.57143
#> 11 42796 84 86.57143
#> 12 43533 96 86.57143
#> 13 44274 84 87.00000
#> 14 44988 84 87.27273
#> 15  4569 96 96.00000


df %>%
  mutate(roll_15s = roll_vec) %>%
  ggplot(aes(MS, HR))+
  geom_line()+
  geom_line(aes(y = roll_15s), color = "blue")

Notice that in the plot, the black line is the raw data and the blue line is the 15s rolling average.

0
votes

One possible solution:

library(magrittr)
start_range <- df$MS[df$MS < max(df$MS)-15000]

lapply(start_range,function(t){
  data.frame(MS = mean(df$MS[df$MS %between% c(t,t+15000)]),
             HR = mean(df$HR[df$MS %between% c(t,t+15000)]))
}) %>% Reduce(rbind,.)

        MS       HR
1 43218.00 86.18182
2 43907.82 86.18182
3 44603.55 86.18182
4 44948.29 86.28571
5 45673.38 86.33333

I added some points to your data (I had only two points with the data you give):

MS <- c(36148, 36753,37364,38062,38737,39580,40029,40387,41208,42006,42796, 43533,44274,44988,45696,46398,47079,47742,48429,49135,49861,50591,51324,52059,53289,54424)
HR <- c(84,84,84,84,84,96,84,84,96,84,84,96,84,84,96,84,84,84,84,84,84,84,84,84,85,88)
df <- data.frame(MS, HR)

The idea here is to calculate, for each MS value, the mean of HR and the time MSof all points having a time between this value (t in lapply) and 15 s after. I restrict that on the range where I have values encompassing the 15s : the start_range vector.