I would like to perform calculations on multiple columns using tidyverse. I know how to do it for a single user (represented in a single column), but I need to do it for 1000+ users (and thus an equal amount of columns).
However, I'm not that well acquainted with using tidyverse and calculating with tibbles, but I've had some earlier help on this platform (the exact coding differs from the one below, but I brought it down to the core issue).
The dataset contains all hours of a year (8760 values, 365 days with each 24 hours) accompanied by values for multiple users.
Per user, I need to aggregate the positive values between a specific timeframe (e.g. everything between 00:00 and 03:00), subtract those from the aggregate values between 03:00 and 05:00 (regardless whether these values are positive of negative). In total there are 1000+ users.
library(tidyverse)
library(lubridate)
set.seed(4)
time_index <- seq(
from = as.POSIXct("2016-01-01 00:00"),
to = as.POSIXct("2016-12-31 23:00"),
by = "hour"
)
user1 <- runif(length(time_index), min = -1, max = 1)
user2 <- runif(length(time_index), min = -1, max = 1)
user3 <- runif(length(time_index), min = -1, max = 1)
example <- data.frame(time_index, user1, user2, user3)
The code for a single column(user) is:
df_intermediate <- example %>%
mutate(
date = as_date(time_index),
hour = hour(time_index),
hour_block = case_when(
between(hour, 0, 2) ~ "block_1",
between(hour, 3, 5) ~ "block_2",
TRUE ~ NA_character_
)
) %>%
filter(!is.na(hour_block)) %>%
group_by(date, hour_block) %>%
nest() %>%
ungroup() %>%
mutate(
intermediate_result = if_else(
hour_block == "block_1",
map_dbl(data, ~ sum(.$user[.$user> 0 ])),
map_dbl(data, ~ sum(.$user))
)
) %>%
group_by(date) %>%
summarise(
final_result = first(intermediate_result) - last(intermediate_result)
)
This gives the following results for a single user:
df_intermediate
#> # A tibble: 366 x 2
#> date final_result
#> <date> <dbl>
#> 1 2016-01-01 0.469
#> 2 2016-01-02 0.189
#> 3 2016-01-03 -1.32
I have not been able to scale it up to multiple users. I looked at using mutate_at or writing an own function to include in the mutate_at, but I do not know how to include the condition (there should only be positive values in the "first_block"), and the multitude of columns. So how could this be mutated for multiple columns instead of just a single one?