2
votes

I have a dataframe of classes, sorted by the number of periods or blocks in a day. I'd like to have another variable that shows that every group of classes as a series, but only if they are one after another. So if there are two math classes in period 4 and 5, that would be one group, while the math in period 7 and 8 would be a different group. I'm interested in a dplyr method, but other methods will work as well.

I've tried to do group_by with mutate, but I'm missing a step.

df <- data.frame(
  period = c(1:8),
  classes = c("hist", "hist", "hist",
          "math", "math",
          "physics",
          "math", "math")
)

I want the following output:

df <- data.frame(
  period = c(1:8),
  classes = c("hist", "hist", "hist",
          "math", "math",
          "physics",
          "math", "math")
 series = c(1, 1, 1, 2, 2, 3, 4, 4)
)
3

3 Answers

2
votes

We can also use rleid from data.table:

library(data.table)
setDT(df)[,series := rleid(classes)]

In a dplyr pipe:

library(dplyr)
df %>%
  mutate(series = data.table::rleid(classes))

Output:

   period classes series
1:      1    hist      1
2:      2    hist      1
3:      3    hist      1
4:      4    math      2
5:      5    math      2
6:      6 physics      3
7:      7    math      4
8:      8    math      4
2
votes

You need to use rle()

rle_length <- rle(as.character(df$classes))$length

df$series <- rep(seq(1:length(rle_length)),rle_length)

> df
  period classes series
1      1    hist      1
2      2    hist      1
3      3    hist      1
4      4    math      2
5      5    math      2
6      6 physics      3
7      7    math      4
8      8    math      4
> 
0
votes

One naive approach could be using a for loop

series = rep(1,nrow(df))
for (i in 2:nrow(df))
{
  same = identical(df$classes[i-1], df$classes[i])
  series[i] = ifelse(same == T, series[i-1], series[i-1]+1)
}
df$series = series