tidyverse - filtering within a nested column/list based on number of NA's per row

Question

Trying to extend my own workflow (from columns) here: [1] tidyverse - delete a column within a nested column/list to filtering within a nested column/list, I found this potential solution: [2] Use filter() (and other dplyr functions) inside nested data frames with map()

My problem is that I want to filter in each "nest" on those rows that are not completely NA (i.e. I want to keep any row that has at least one non-missing value.

However, the simple solution in [2] doesn't work for me, probably because I want to filter on the sum of NA's per row, which might involve another map function within the filter.

(Note: I'm using the current github version of dplyr within tidyverse which offers some new experimental functions, like condense - which I'm using below, but I think that's not relevant for my problem/question).

I have the following data:

Data:

library(tidyverse)
library(corrr)

dat <- data.frame(grp = rep(1:4, each = 25),
                  Q1 = sample(c(1:5, NA), 100, replace = TRUE),
                  Q2 = sample(c(1:5, NA), 100, replace = TRUE),
                  Q3 = sample(c(1:5, NA), 100, replace = TRUE),
                  Q4 = sample(c(1:5, NA), 100, replace = TRUE),
                  Q5 = sample(c(NA), 100, replace = TRUE),
                  Q6 = sample(c(1:5, NA), 100, replace = TRUE))

I now calculate the correlations of Q1 to Q6 per group and delete the rowname column.

cor_dat <- dat %>%
  group_by(grp) %>%
  condense(cor = correlate(cur_data()) %>%
         select(-rowname)) %>%
  ungroup()

But adding this line to my pipeline doesn't work:

cor_dat <- cor_dat %>%
  mutate(cor = map(cor, ~ filter(., sum(is.na(.)) != ncol(.))))

I also tried, but this doesn't work either:

cor_dat <- cor_dat %>%
      mutate(cor = map(cor, ~ filter(., !all(is.na(.)))))

Expected outcome in my data would be that the fifth row in each nest is filtered out.

Sure. just forgot this to copy over from the other thread. edited my post accordingly. — deschen

akrun akrun · Accepted Answer · 2020-02-17T21:53:10

Here, we can use filter_all

library(dplyr)
library(purrr)
cor_dat <- cor_dat %>% 
               mutate(cor = map(cor, ~ .x %>% 
                              filter_all(any_vars(!is.na(.)))))
cor_dat
# A tibble: 4 x 2
#    grp cor             
#  <int> <list>          
#1     1 <tibble [5 × 6]>
#2     2 <tibble [5 × 6]>
#3     3 <tibble [5 × 6]>
#4     4 <tibble [5 × 6]>

cor_dat$cor[[1]]
# A tibble: 5 x 6
#       Q1      Q2      Q3      Q4    Q5      Q6
#    <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
#1 NA      -0.226   0.288  -0.0536    NA  0.581 
#2 -0.226  NA      -0.382   0.212     NA  0.0274
#3  0.288  -0.382  NA      -0.0772    NA -0.153 
#4 -0.0536  0.212  -0.0772 NA         NA  0.0831
#5  0.581   0.0274 -0.153   0.0831    NA NA

Or if we need to use filter then create the logic with rowSums

cor_dat %>%
       mutate(cor = map(cor, ~ .x %>%
                                  filter(rowSums(is.na(.)) < ncol(.))))

data

cor_dat <- dat %>%
   group_by(grp) %>%
   condense(cor = correlate(cur_data()) %>% 
                 select(-rowname)) %>% 
   ungroup

tidyverse - filtering within a nested column/list based on number of NA's per row

1 Answers

data