How do I apply a function after group_by using dplyr to remove those groups with 2 or more consecutive NAs? I have written a function that outputs True or False whether a column in a dataframe has 2 or more NAs:
# function for determining if ts contains consecutive NAs
is.na.contiguous <- function(df, consecutive) {
na.rle <- rle(is.na(df$b))
na.rle$values <- na.rle$values & na.rle$lengths >= consecutive
any(na.rle$values)
}
# example df
d = structure(list(a = c(1, 2, 3, 4, 5, 6, 7, 8), b = c(1, 2, 2,
+ NA, NA, 2, NA, 2), c = c(1, 1, 1, 2, 2, 2, 3, 3)), class = "data.frame", row.names = c(NA,
+ -8L))
head(d)
a b c
1 1 1 1
2 2 2 1
3 3 2 1
4 4 NA 2
5 5 NA 2
6 6 2 2
7 7 NA 3
8 8 2 3
# test function
is.na.contiguous(d,2)
TRUE # column b has 2 consecutive NAs
is.na.contiguous(d,3)
FALSE # column b does not have 3 consecutive NAs
Now how do I apply this function to each group in the dataframe? Below is what I have tried:
d %>% group_by(c) %>% mutate(consecNA = is.na.contiguous(.,2)) %>% as.data.frame()
a b c consecNA
1 1 1 1 TRUE
2 2 2 1 TRUE
3 3 2 1 TRUE
4 4 NA 2 TRUE
5 5 NA 2 TRUE
6 6 2 2 TRUE
7 7 NA 3 TRUE
8 8 2 3 TRUE
What am I doing wrong?
d %>% group_by(c) %>% mutate(consecNA = any(is.na(b) & lag(is.na(b), default = FALSE)))
; to drop groups,d %>% group_by(c) %>% filter(!any(is.na(b) & lag(is.na(b), default = FALSE)))
– alistaire