How do I count only previous value not using summarize in R?

Question

This is my dataset.

num col1
1   SENSOR_01
2   SENSOR_01
3   SENSOR_01
4   SENSOR_05
5   SENSOR_05
6   SENSOR_05
7   NA
8   SENSOR_01
9   SENSOR_01
10  SENSOR_05
11  SENSOR_05

structure(list(num = 1:11, col1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, NA, 1L, 1L, 2L, 2L), .Label = c("SENSOR_01", "SENSOR_05" ), class = "factor"), count = c(3L, 3L, 3L, 3L, 3L, 3L, 0L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, -11L))

I would like to count for only previous duplicated rows. In the row 1-3, there are sensor 3 repeatedly 3 times so count = 3. Here is my expected outcome.

num col1    count
1   SENSOR_01   3
2   SENSOR_01   3
3   SENSOR_01   3
4   SENSOR_05   3
5   SENSOR_05   3
6   SENSOR_05   3
7   NA          1
8   SENSOR_01   2
9   SENSOR_01   2
10  SENSOR_05   2
11  SENSOR_05   2

Using dplyr, How can I make this outcome?

utubun utubun · Accepted Answer · 2019-07-28T16:13:27

Like an option, we can use order of variables (rownames in traditional data.frame). The idea is simple:

If within the group of identical sensor names, the distance between adjacent records is equal to 1 and the same is true in a global view, without grouping - set the flag for this record to zero or one otherwise;
Still within the group of identical sensor names, find cumulative sum of flags, which allows us to identify all subgroups of records appearing consequently in global data set;
Still within the group count the number of elements in each individual subgroup;
Repeat for each group of records.

In tidyverse:

dat %>%
  mutate(tmp = 1:n()) %>%
  group_by(col1) %>%
  add_count(tmp = cumsum(c(0, diff(tmp)) > 1)) %>%
  ungroup() %>%
  select(-tmp)


# # A tibble: 11 x 3
#      num col1          n
#    <int> <fct>     <int>
#  1     1 SENSOR_01     3
#  2     2 SENSOR_01     3
#  3     3 SENSOR_01     3
#  4     4 SENSOR_05     3
#  5     5 SENSOR_05     3
#  6     6 SENSOR_05     3
#  7     7 NA            1
#  8     8 SENSOR_01     2
#  9     9 SENSOR_01     2
# 10    10 SENSOR_05     2
# 11    11 SENSOR_05     2

Data:

dat <- structure(
  list(
    num = 1:11, 
    col1 = structure(
      c(1L, 1L, 1L, 2L, 2L, 2L, NA, 1L, 1L, 2L, 2L), 
      .Label = c("SENSOR_01", "SENSOR_05" ), 
      class = "factor")
    ), 
  class = "data.frame", 
  row.names = c(NA, -11L)
  )

How do I count only previous value not using summarize in R?

3 Answers

data