1
votes

Is there a built-in way to calculate sums over different rows and columns? I know that I could form a new dataframe from id, drug, day2, sum_d2, rename the last two columns, delete these columns in the "old" dataframe, perform rbind with the "old" dataframe and summarise by group. But this seems strangely complicated and possibly error-prone.

How do I calculate the sum of sum_1 and sum_2 of drug_a given on 2020-01-02 using id + drugname as grouping variables + day1 + day2 (when these two are the same)?

The reason for this format is that I have to split the dosage of a continous infusion at midnight...

Example data:

id <- c(rep(1,2))
drug <- c(rep("Drug_a",2))                     
day1 <- c(rep("2020-01-01",1),rep("2020-01-02",1))
sum_1 <- c(rep(250,1),rep(550,1))
day2 <- c(rep("2020-01-02",1),rep("2020-01-03",1))
sum_2 <- c(rep(100,1),rep(75,1))

example_data <- data.frame(id,drug,day1,sum_1,day2,sum_2)

  id   drug       day1 sum_1       day2 sum_2
1  1 Drug_a 2020-01-01   250 2020-01-02   100
2  1 Drug_a 2020-01-02   550 2020-01-03    75

Expected output within these lines:

  id   drug       day    sum
1  1 Drug_a 2020-01-01   250 
2  1 Drug_a 2020-01-02   650 
3  1 Drug_a 2020-01-03    75
2
Can you show your expected output? edit the question and add it to the body of your post. Cheers.M--

2 Answers

2
votes

Perhaps something like this might work. You can use pivot_longer to put day and sum in single columns (i.e., combine day_1 and day_2 into day, sum_1 and sum_2 into sum).

library(tidyverse)

example_data %>%
  pivot_longer(cols = c(-id, -drug), names_to = c(".value", "group"), names_sep = "_") %>%
  group_by(id, drug, day) %>%
  summarise (total = sum(sum))

# A tibble: 3 x 4
# Groups:   id, drug [1]
     id drug   day        total
  <dbl> <fct>  <fct>      <dbl>
1     1 Drug_a 2020-01-01   250
2     1 Drug_a 2020-01-02   650
3     1 Drug_a 2020-01-03    75

Data

id <- c(rep(1,2))
drug <- c(rep("Drug_a",2))                     
day_1 <- c(rep("2020-01-01",1),rep("2020-01-02",1))
sum_1 <- c(rep(250,1),rep(550,1))
day_2 <- c(rep("2020-01-02",1),rep("2020-01-03",1))
sum_2 <- c(rep(100,1),rep(75,1))

example_data <- data.frame(id,drug,day_1,sum_1,day_2,sum_2)
1
votes

We can use melt from data.table

library(data.table)
melt(setDT(example_data), measure = patterns('^day', '^sum'),
    value.name = c('day', 'sum'))[, .(total = sum(sum)), .(id, drug, day)]
#   id   drug        day total
#1:  1 Drug_a 2020-01-01   250
#2:  1 Drug_a 2020-01-02   650
#3:  1 Drug_a 2020-01-03    75

data

id <- c(rep(1,2))
drug <- c(rep("Drug_a",2))                     
day_1 <- c(rep("2020-01-01",1),rep("2020-01-02",1))
sum_1 <- c(rep(250,1),rep(550,1))
day_2 <- c(rep("2020-01-02",1),rep("2020-01-03",1))
sum_2 <- c(rep(100,1),rep(75,1))

example_data <- data.frame(id,drug,day_1,sum_1,day_2,sum_2)