0
votes

I want to create a new column where at each row TRUE is returned if a certain value is found within the last n rows of another column, and FALSE is returned otherwise.

Here is an example dataframe (suppose this is a sample from a much larger dataframe):

A
2
23
1
5
6
15
14
3
7
9
55
3
77
2

And here is what I want (where conditional value=1 and n=10)

A   B
2   FALSE
23  FALSE
1   FALSE
5   TRUE
6   TRUE
15  TRUE
14  TRUE
3   TRUE
7   TRUE
9   TRUE
55  TRUE
3   TRUE
77  TRUE
2   FALSE

I can do this with many "OR" conditions in an ifelse statement in dplyr:

df<-df %>% mutate(B=ifelse(lag(A)==1|lag(A,2)==1 ... |lag(A,10)==1,T,F))

But this is far too tedious, especially when n is large. Also, lag in dplyr only takes integers so lag(A,1:10) doesn't work.

Is there an easy way to do this (preferably without a for loop)?

2

2 Answers

1
votes

As you've noticed, lag from dplyr does not allow you to pass a vector as shift amount, but the shift function from data.table allows you to do so, which has the same functionality as lag and lead in dplyr, so you can use shift from data.table with a Reduce function to do that:

library(data.table)
setDT(df)
df[, B := Reduce("|", shift(A == 1, n = 1:10, fill = F))]
df
     A     B
# 1:  2 FALSE
# 2: 23 FALSE
# 3:  1 FALSE
# 4:  5  TRUE
# 5:  6  TRUE
# 6: 15  TRUE
# 7: 14  TRUE
# 8:  3  TRUE
# 9:  7  TRUE
#10:  9  TRUE
#11: 55  TRUE
#12:  3  TRUE
#13: 77  TRUE
#14:  2 FALSE
0
votes

We can also do this in dplyr with do and shift from data.table

library(dplyr)
df %>% 
    do(data.frame(., B= Reduce(`|`, shift(.$A==1, n = 1:10, fill = 0)))) 
#   A     B
#1   2 FALSE
#2  23 FALSE
#3   1 FALSE
#4   5  TRUE
#5   6  TRUE
#6  15  TRUE
#7  14  TRUE
#8   3  TRUE
#9   7  TRUE
#10  9  TRUE
#11 55  TRUE
#12  3  TRUE
#13 77  TRUE
#14  2 FALSE