1
votes

I need to add a new column in dplyr by mutate involving an conditional. I can't find a way to implement the following scheme in the tidyverse, but I can do it Excel. That makes me feel like something of a barbarian. Does someone know how to accomplish this in the tidyverse?

  • The first value of the running count column is 1, no matter what is in the "n" column.
  • After the first row, here is the conditional. If the n column=1, the running.count output is the running.count value from the row above +1. If the n column=0, the running.count output is the running.count value from the row above +1 only when it is the first 0 after a 1 in the "n" column. Otherwise, it is just the running.count value from the row above.

Here's some toy data with the desired output:

data.frame("n"=c(0,1,0,0,0,0,1,0,1,1),"running.count"=c(1,2,3,3,3,3,4,5,6,7))

This is the output.

1
Can you try library(dplyr); library(data.table);df1 %>% group_by(running.count = rleid(n) ) %>% mutate(ind = if(all(n==1)) duplicated(n) else FALSE) %>% ungroup %>% mutate(running.count = running.count + ind) %>% select(-ind)akrun
Try df2 %>% group_by(running.count = rleid(n) ) %>% mutate(ind = if(all(n==1)) row_number() - 1 else 0) %>% ungroup %>% mutate(running.count = rleid(running.count, ind)) %>% select(-ind)akrun
the second code should give the expected outputakrun
@akrun That looks correct to me. I would mark it as answered, but I don't see how to do that for you. Thank you so much!!curiositasisasinbutstillcuriou
Or cumsum(c(1, diff(n) != 0 | n[-1] == 1)).Rui Barradas

1 Answers

0
votes

We can use rleid from data.table to create the running.count column

library(dplyr)
library(data.table)
df1 %>% 
   group_by(running.count = rleid(n) ) %>% 
   mutate(ind = if(all(n==1))  row_number() - 1 else 0) %>% 
   ungroup %>% 
   mutate(running.count = rleid(running.count, ind)) %>% 
   select(-ind)
# A tibble: 10 x 2
#       n running.count
#   <dbl>         <int>
# 1     0             1
# 2     1             2
# 3     0             3
# 4     0             3
# 5     0             3
# 6     0             3
# 7     1             4
# 8     0             5
# 9     1             6
#10     1             7

data

df1 ,- structure(list(n = c(0, 1, 0, 0, 0, 0, 1, 0, 1, 1)), 
   class = "data.frame", row.names = c(NA, -10L))