1
votes

I am trying to merge values in a dataframe by every nth row. The data structure looks as follows:

id  value
1   1
2   2
3   1
4   2
5   3
6   4
7   1
8   2
9   4
10  4
11  2
12  1

I like to aggregate the values for every 4 rows each. Actually, the dataset describes a measurement for each a 4-day period.

id"1" = day1,
id"2" = day2,
id"3" = day3,
id"4" = day4,
id"5" = day1,
...

As such, a column counting in a loop from 1 to 4 might be used?

The result should look like (sums):

day sum
1   8
2   10
3   4
4   5
2

2 Answers

3
votes

This can be achieved with %% for creating a grouping variable and then do the sum with aggregate

n <- 4
aggregate(value ~cbind(day = (seq_along(df1$id)-1) %% n + 1), df1, FUN = sum)
#   day value
#1   1     8
#2   2    10
#3   3     4
#4   4     5

This approach can also be used with dplyr/data.table

library(dplyr)
df1 %>%
     group_by(day = (seq_along(id)-1) %% 4 +1) %>%
      summarise(value = sum(value))
#     day value
#  <dbl> <int>
#1     1     8
#2     2    10
#3     3     4
#4     4     5

or

setDT(df1)[, .(value = sum(value)), .(day = (seq_along(id) - 1) %% 4 + 1)]
#   day value
#1:   1     8
#2:   2    10
#3:   3     4
#4:   4     5
1
votes

You need to make a sequence to group by, e.g.

rep(1:4, length = nrow(df))
## [1] 1 2 3 4 1 2 3 4 1 2 3 4

In aggregate:

aggregate(value ~ cbind(day = rep(1:4, length = nrow(df))), df, FUN = sum)

##   day value
## 1   1     8
## 2   2    10
## 3   3     4
## 4   4     5

or dplyr:

library(dplyr)

df %>% group_by(day = rep(1:4, length = n())) %>% summarise(sum = sum(value))

## # A tibble: 4 x 2
##     day   sum
##   <int> <int>
## 1     1     8
## 2     2    10
## 3     3     4
## 4     4     5

or data.table:

library(data.table)

setDT(df)[, .(sum = sum(value)), by = .(day = rep(1:4, length = nrow(df)))]

##    day sum
## 1:   1   8
## 2:   2  10
## 3:   3   4
## 4:   4   5