This is not so much question about how to do something but more about how to do it efficiently. In particular, I would like to drop NA
s in a repeated measures design in such a way that each group has all complete observations.
In the bugs_long
dataframe below, the same participant takes part in four condition
and report their desire
to kill bugs in each condition. Now if I wanted to carry out some repeated measures analysis with this dataset, this typically doesn't work in the long format because a different number of observations are found for each group after the pairwise exclusion of NA
s. So the final dataframe should leave out the following five subjects.
# setup
set.seed(123)
library(ipmisc)
library(tidyverse)
# looking at the NAs
dplyr::filter(bugs_long, is.na(desire))
#> # A tibble: 5 x 6
#> subject gender region education condition desire
#> <int> <fct> <fct> <fct> <chr> <dbl>
#> 1 2 Female North America advance LDHF NA
#> 2 80 Female North America less LDHF NA
#> 3 42 Female North America high HDLF NA
#> 4 64 Female Europe some HDLF NA
#> 5 10 Female Other high HDHF NA
Here is the current roundabout way I am hacking this and getting it to work:
# figuring out the number of levels in the grouping factor
x_n_levels <- nlevels(as.factor(bugs_long$condition))[[1]]
# removing observations that don't have all repeated values
df <-
bugs_long %>%
filter(!is.na(condition)) %>%
group_by(condition) %>%
mutate(id = dplyr::row_number()) %>%
ungroup(.) %>%
filter(!is.na(desire)) %>%
group_by(id) %>%
mutate(n = dplyr::n()) %>%
ungroup(.) %>%
filter(n == x_n_levels) %>%
select(-n)
# did this work? yes
df %>%
group_by(condition) %>%
count()
#> # A tibble: 4 x 2
#> # Groups: condition [4]
#> condition n
#> <chr> <int>
#> 1 HDHF 88
#> 2 HDLF 88
#> 3 LDHF 88
#> 4 LDLF 88
But I would be surprised if the tidyverse
(dplyr
+ tidyr
) doesn't have a more efficient way to achieve this and I would really appreciate it if anyone else has a better refactoring this.